# Scraping Doctor Who Episodes

## Introduction
The following program scrapes the list of  **Doctor Who seasons and episodes ([1963-1989](https://en.wikipedia.org/wiki/List_of_Doctor_Who_episodes_(1963%E2%80%931989)), [2005-Present](https://en.wikipedia.org/wiki/List_of_Doctor_Who_episodes_(2005%E2%80%93present)))** from Wikipedia page.

## Table of Contents
1. Import Primary Modules
2. Scraping 1963-1989 episodes
3. Dataframe Transformation (1963-1989)
4. Scraping 2005-Present episodes
5. Dataframe Transformation (2005-Present)
6. Combining both Dataframes

### About dataset

The dataset that will be scraped from webpage contains 861 records of broadcasted Dr.Who episodes and specials from both Classical and Revival Era. It includes following fields:
       
| Field          | Description                                                                                  |
|----------------|----------------------------------------------------------------------------------------------|
| Story          | The story number not official designations of episodes but rough guide of story arc          |
| Serial         | Numeric classifier of plots in  sequential episode-by-episode fashion                        |
| Episode        | Numeric classifier of an episode in a season or series                                       |
| Serial title   | The main title of the serial                                                                 |
| Episode titles | The title of an epidsode                                                                     |
| Directed by    | A person who directed the main production                                                    |
| Written by     | A person or people who wrote the story or plot                                               |
| Prod.code      | An alphanumeric designation used to  identify episodes within a television series            |
| UK viewers(M)  | The number of viewers in the UK(United Kingdom) in millions                                  |
| AI             | A score out of 100 which is used as an indicator of the public's appreciation of TV series   |

## Importing Primary Modules

The program heavily relies on [*pandas*](http://pandas.pydata.org/) and [**Numpy**](http://www.numpy.org/) for data wrangling and transformation.

Given that website contains multiple tables and complicated syntax, for faster and efficient scraping 
**`pd.read_html`** will be used instead of **`BeautifulSoul.`**

In [1]:
import pandas as pd
import requests
import csv
import numpy as np

## Scraping 1963-1989 episodes.
### Data Extraction

Define url where the data is located.

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_Doctor_Who_episodes_(1963%E2%80%931989)'

Using **for loop** all the tables will be scraped into consequtive dataframes starting with **dfc_** and **#** of the table, whereas **"c"** stands for seasons from **_Classical Era_**. There are a total of **26** seasons however, after thorough analysis of webpage it is identified that an index of episode table starts with **2 and ends with 29** thus giving the **range(2,30)***.

*We use 30 to cover all 29 tables with episodes.

In [3]:
for c in range(2,30):
     exec('dfc_{} = pd.read_html(url,header=0)[{}]'.format(c,c))

To reduce the number of lines and calls of repetitive code, all dataframes will be merged using **pd.concat** under **df_clas**.

In [4]:
pdList = []
pdList.extend(value for name, value in locals().items() 
              if name.startswith('dfc_'))
df_clas = pd.concat(pdList, ignore_index=True)

## Dataframe Transformation/ Cleaning.

Dataset information prior to changes.

In [5]:
df_clas

Unnamed: 0,Story,Serial,Serial title,Episode titles,Directed by,Written by,Original air date,Prod.code,UK viewers(millions) [7],AI [7],Unnamed: 10,Unnamed: 11,Unnamed: 12
0,1,1,An Unearthly Child,"""An Unearthly Child""",Waris Hussein,Anthony Coburn and C. E. Webber (uncredited),23 November 1963,A,4.4,63,,,
1,1,1,An Unearthly Child,"""The Cave of Skulls""",Waris Hussein,Anthony Coburn,30 November 1963,A,5.9,59,,,
2,1,1,An Unearthly Child,"""The Forest of Fear""",Waris Hussein,Anthony Coburn,7 December 1963,A,6.9,56,,,
3,1,1,An Unearthly Child,"""The Firemaker""",Waris Hussein,Anthony Coburn,14 December 1963,A,6.4,55,,,
4,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
851,,,,,,,,,,,,,
852,155,4,Survival,"""Part One""",Alan Wareing,Rona Munro,22 November 1989,7P,5,69,,,
853,155,4,Survival,"""Part Two""",Alan Wareing,Rona Munro,29 November 1989,7P,4.8,69,,,
854,155,4,Survival,"""Part Three""",Alan Wareing,Rona Munro,6 December 1989,7P,5,71,,,


In [6]:
df_clas.dtypes

Story                       object
Serial                      object
Serial title                object
Episode titles              object
Directed by                 object
Written by                  object
Original air date           object
Prod.code                   object
UK viewers(millions) [7]    object
AI [7]                      object
Unnamed: 10                 object
Unnamed: 11                 object
Unnamed: 12                 object
dtype: object

In [7]:
df_clas.shape

(856, 13)

### Changing Columns and Rows Attributes

#### Removing rows and columns

In [8]:
#Removing Nah and Special rows
df_clas=df_clas.dropna(axis='rows',subset=['Serial title'])
df_clas.drop(df_clas[df_clas['Serial title'] == 'Special'].index, inplace = True) 

#Removing Unnamed Columns
df_clas.drop(columns=['Unnamed: 10','Unnamed: 11','Unnamed: 12'],inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


The "Shada" episode removed from dataframe. Since an episode was left unfinished due to a strike but was later completed and officially released on home media in 2017. 

In [9]:
#Removing Shada.
df_clas.drop(df_clas[df_clas['Serial title'] == 'Shada'].index, inplace = True) 

#### Renaming rows and columns

In [10]:
#Renaming UK viewers(millions) and AI columns by dropping [7] 
df_clas=df_clas.rename(columns={'UK viewers(millions) [7]':'UK viewers(millions)',
                           'AI [7]': 'AI'})

#Renaming Rows Attributes
df_clas=df_clas.replace({"[Episode 3][note 1]":"\"Episode 3\""})
df_clas=df_clas.replace(["143a","143b","143c","143d"],143)
df_clas['Original air date'] = df_clas['Original air date'].str.strip('[a]')
df_clas=df_clas.replace(['—','–'],None)

#### Changing datatype

In [11]:
df_clas=df_clas.convert_dtypes()

In [12]:
df_clas.dtypes

Story                   object
Serial                  object
Serial title            string
Episode titles          string
Directed by             string
Written by              string
Original air date       string
Prod.code               string
UK viewers(millions)    object
AI                      object
dtype: object

In [13]:
df_clas['Story'] = df_clas['Story'].astype('int64') 
df_clas['Serial'] = df_clas['Serial'].astype('int64') 
df_clas['UK viewers(millions)'] = df_clas['UK viewers(millions)'].astype('float') 
df_clas['AI'] = df_clas['AI'].astype('int64')
df_clas['Original air date']= pd.to_datetime(df_clas['Original air date'])

In [14]:
df_clas.dtypes

Story                            int64
Serial                           int64
Serial title                    string
Episode titles                  string
Directed by                     string
Written by                      string
Original air date       datetime64[ns]
Prod.code                       string
UK viewers(millions)           float64
AI                               int64
dtype: object

### Additional Changes

#### Addining Episode#, Season #, Doctor #,  and Lead Actor

In [15]:
conditions=[df_clas['Story'].between(1,8),df_clas['Story'].between(9,17),
            df_clas['Story'].between(18,27),df_clas['Story'].between(28,36),
            df_clas['Story'].between(37,43),df_clas['Story'].between(44,50),
            df_clas['Story'].between(51,54),df_clas['Story'].between(55,59),
            df_clas['Story'].between(60,64),df_clas['Story'].between(65,69),
            df_clas['Story'].between(70,74),df_clas['Story'].between(75,79),
            df_clas['Story'].between(80,85),df_clas['Story'].between(86,91),
            df_clas['Story'].between(92,97),df_clas['Story'].between(98,103),
            df_clas['Story'].between(104,108),df_clas['Story'].between(109,115),
            df_clas['Story'].between(116,122),df_clas['Story'].between(123,129),
            df_clas['Story'].between(130,136),df_clas['Story'].between(137,142),
            df_clas['Story'].between(143,143),df_clas['Story'].between(144,147),
            df_clas['Story'].between(148,151),df_clas['Story'].between(152,155)]

choices = []
for n in range(1,27):
    choices.append(n)

df_clas['Season'] = np.select(conditions,choices)

In [16]:
cond=[df_clas['Season'].between(1,3),df_clas['Season'].between(4,6),
      df_clas['Season'].between(7,11),df_clas['Season'].between(12,18),
      df_clas['Season'].between(19,21),df_clas['Season'].between(22,23),
      df_clas['Season'].between(24,26)]

doctor = ['First Doctor','Second Doctor','Third Doctor',
          'Fourth Doctor','Fifth Doctor','Sixth Doctor','Seventh Doctor']
actor = ['William Hartnell','Patrick Troughton','Jon Pertwee',
         'Tom Baker','Peter Davison','Colin Baker','Sylvester McCoy']
df_clas['Doctor'] = np.select(cond,doctor)
df_clas['Lead Actor'] = np.select(cond,actor)

Addding **Episode** column to dataframe.

In [17]:
episodes = []
for s in range(1,df_clas['Season'].nunique()+1):
    n_ep = len(df_clas[df_clas['Season']==s])
    for n in range(1,n_ep+1):
        episodes.append(n)
        
df_clas['Episode'] = episodes

Reordering columns in a **df_class** dataframe.

In [18]:
column_names = ["Story", "Serial", "Episode", "Serial title",
                "Episode titles", "Directed by", "Written by",
                "Original air date", "Prod.code", "UK viewers(millions)", 
                "AI", "Season", "Doctor", "Lead Actor"]

df_clas = df_clas.reindex(columns=column_names)

#### End result

In [19]:
#Resetting Index
df_clas.index = np.arange(1,len(df_clas)+1)
df_clas

Unnamed: 0,Story,Serial,Episode,Serial title,Episode titles,Directed by,Written by,Original air date,Prod.code,UK viewers(millions),AI,Season,Doctor,Lead Actor
1,1,1,1,An Unearthly Child,"""An Unearthly Child""",Waris Hussein,Anthony Coburn and C. E. Webber (uncredited),1963-11-23,A,4.4,63,1,First Doctor,William Hartnell
2,1,1,2,An Unearthly Child,"""The Cave of Skulls""",Waris Hussein,Anthony Coburn,1963-11-30,A,5.9,59,1,First Doctor,William Hartnell
3,1,1,3,An Unearthly Child,"""The Forest of Fear""",Waris Hussein,Anthony Coburn,1963-12-07,A,6.9,56,1,First Doctor,William Hartnell
4,1,1,4,An Unearthly Child,"""The Firemaker""",Waris Hussein,Anthony Coburn,1963-12-14,A,6.4,55,1,First Doctor,William Hartnell
5,2,2,5,The Daleks,"""The Dead Planet""",Christopher Barry,Terry Nation,1963-12-21,B,6.9,59,1,First Doctor,William Hartnell
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
691,154,3,10,The Curse of Fenric,"""Part Three""",Nicholas Mallett,Ian Briggs,1989-11-08,7M,4.0,68,26,Seventh Doctor,Sylvester McCoy
692,154,3,11,The Curse of Fenric,"""Part Four""",Nicholas Mallett,Ian Briggs,1989-11-15,7M,4.2,68,26,Seventh Doctor,Sylvester McCoy
693,155,4,12,Survival,"""Part One""",Alan Wareing,Rona Munro,1989-11-22,7P,5.0,69,26,Seventh Doctor,Sylvester McCoy
694,155,4,13,Survival,"""Part Two""",Alan Wareing,Rona Munro,1989-11-29,7P,4.8,69,26,Seventh Doctor,Sylvester McCoy


Using **.groupby()** and **.stack()** we can check if double check the number of episodes per season.

In [20]:
df_clas[['Season','Episode']].groupby('Season').nunique().stack()

Season         
1       Episode    42
2       Episode    39
3       Episode    45
4       Episode    43
5       Episode    40
6       Episode    44
7       Episode    25
8       Episode    25
9       Episode    26
10      Episode    26
11      Episode    26
12      Episode    20
13      Episode    26
14      Episode    26
15      Episode    26
16      Episode    26
17      Episode    20
18      Episode    28
19      Episode    26
20      Episode    23
21      Episode    24
22      Episode    13
23      Episode    14
24      Episode    14
25      Episode    14
26      Episode    14
dtype: int64

In [21]:
df_clas.shape

(695, 14)

In [22]:
df_clas.dtypes

Story                            int64
Serial                           int64
Episode                          int64
Serial title                    string
Episode titles                  string
Directed by                     string
Written by                      string
Original air date       datetime64[ns]
Prod.code                       string
UK viewers(millions)           float64
AI                               int64
Season                           int32
Doctor                          object
Lead Actor                      object
dtype: object

##### Optional: Saving dataframe as CSV  file

In [23]:
df_clas.to_csv("dr_who_classical.csv")

## Scraping 2005-Present Episodes.
### Data Extraction

Define url of list of new episodes.

In [24]:
url_new = 'https://en.wikipedia.org/wiki/List_of_Doctor_Who_episodes_(2005%E2%80%93present)'

Considering that the tables have somewhat similar formating, the same coding and loops will be used to scrape tables into dataframes. In order prevent dataframes from rewriting existing ones, **dfr_** will be used, whereby **r** designating seasons from **_Revival Era_**. As of Feb. 2021 there are a total of **12 Seasons**, however compare to previous webpage, the seasons tables starts with an index of **3** and ending with an index of **16** thus giving the **range(3,17)**.

In [25]:
for c in range(3,17):
     exec('dfr_{} = pd.read_html(url_new,header=0)[{}]'.format(c,c))

Due to an odd change in tables column titles for the latest **Season 12** the following changes must me made to match structure of other dataframes. Thus using **.rename()** funciton columns including "No.story" and "No. inseries" renamed to "Story" and "Episode".

*_Season 12_ is saved under the dataframe name _dfr_16_.

In [26]:
dfr_16=dfr_16.rename(columns={"No.story":"Story",'No. inseries':'Episode'})
#dfr_16

Merge all dataframes under the names **df_rev**.

In [27]:
pdList_new = []
pdList_new.extend(value for name, value in locals().items() 
              if name.startswith('dfr_'))
df_rev = pd.concat(pdList_new, ignore_index=True)

## Dataframe Transformation/ Cleaning.

Dataset information prior to changes.

In [28]:
df_rev.dtypes

Story                       object
Episode                     object
Title                       object
Directed by                 object
Written by                  object
Original air date           object
Prod.code                   object
UK viewers(millions) [9]    object
AI [9]                      object
Unnamed: 9                  object
Unnamed: 10                 object
Unnamed: 11                 object
Unnamed: 12                 object
Unnamed: 8                  object
dtype: object

In [29]:
df_rev.shape

(187, 14)

### Changing Columns and Rows Attributes

#### Removing  rows and columns

In [30]:
#Removing Nah rows
df_rev=df_rev.dropna(axis='rows',subset=['Title'])

#Remove Unnamed Columns
df_rev=df_rev.drop(columns=['Unnamed: 8','Unnamed: 9','Unnamed: 10','Unnamed: 11','Unnamed: 12'])

#Removing Special, Series, Part 1&2 rows.
df_rev.drop(df_rev[df_rev['Title'].str.contains("Special", na=False)].index,inplace = True)
df_rev.drop(df_rev[df_rev['Title'] == 'Series'].index, inplace = True)
df_rev.drop(df_rev[df_rev['Title'] == 'Part 1'].index, inplace = True) 
df_rev.drop(df_rev[df_rev['Title'] == 'Part 2'].index, inplace = True)

#### Renaming rows and columns

In [31]:
#Renaming UK viewers(millions) and AI columns by dropping [9] 
df_rev=df_rev.rename(columns={'UK viewers(millions) [9]':'UK viewers(millions)',
                              'AI [9]': 'AI'})

#Replacing Story variable index with with ''
df_rev['Story']=df_rev['Story'].str.replace('a', '')
df_rev['Story']=df_rev['Story'].str.replace('b', '')
df_rev['Story']=df_rev['Story'].str.replace('c', '')
df_rev['Story']=df_rev['Story'].str.replace('†', '(missing)')

#Renaming Rows Attributes
df_rev['Prod.code']=df_rev['Prod.code'].fillna('None')
df_rev=df_rev.replace(['—','–'],None)

#Adding Missing indexes to a Story and Ep column
df_rev.loc[df_rev['Title'] == '"The Day of the Doctor"', ['Story']] = '240'
df_rev.loc[df_rev['Title'] == '"The Time of the Doctor"', ['Story']] = '241'

#### Changing datatype

In [32]:
df_rev=df_rev.convert_dtypes()

In [33]:
df_rev.dtypes

Story                   string
Episode                 object
Title                   string
Directed by             string
Written by              string
Original air date       string
Prod.code               object
UK viewers(millions)    object
AI                      object
dtype: object

In [34]:
df_rev['Story'] = df_rev['Story'].astype('int')
df_rev['Episode'] = df_rev['Episode'].astype('int')
df_rev['UK viewers(millions)'] = df_rev['UK viewers(millions)'].astype('float') 
df_rev['AI'] = df_rev['AI'].astype('int')
df_rev['Prod.code'] = df_rev['Prod.code'].astype('str')
df_rev['Original air date']= pd.to_datetime(df_rev['Original air date'])

In [35]:
df_rev.dtypes

Story                            int32
Episode                          int32
Title                           string
Directed by                     string
Written by                      string
Original air date       datetime64[ns]
Prod.code                       object
UK viewers(millions)           float64
AI                               int32
dtype: object

### Additional Changes

#### Addining Season #, Doctor #, and Lead Actor

In [36]:
conditions=[df_rev['Story'].between(157,166),df_rev['Story'].between(167,177),
            df_rev['Story'].between(178,187),df_rev['Story'].between(188,202),
            df_rev['Story'].between(203,212),df_rev['Story'].between(213,224),
            df_rev['Story'].between(225,241),df_rev['Story'].between(242,252),
            df_rev['Story'].between(253,263),df_rev['Story'].between(264,276),
            df_rev['Story'].between(277,287),df_rev['Story'].between(288,296)]

choices = []
for n in range(1,13):
    choices.append(n)

df_rev['Season'] = np.select(conditions,choices)

In [37]:
cond=[df_rev['Season'].between(1,1),df_rev['Season'].between(2,4),
      df_rev['Season'].between(5,7),df_rev['Season'].between(8,10),
      df_rev['Season'].between(11,12)]

doctor = ['Ninth Doctor','Tenth Doctor','Eleventh Doctor',
          'Twelfth Doctor','Thirteenth Doctor']
actor = ['Christopher Eccleston','David Tennant',
         'Matt Smith','Peter Capaldi','Jodie Whittaker']

df_rev['Doctor'] = np.select(cond,doctor)
df_rev['Lead Actor'] = np.select(cond,actor)

#### End result

In [38]:
#Resetting Index
df_rev.index = np.arange(1,len(df_rev)+1)
df_rev

Unnamed: 0,Story,Episode,Title,Directed by,Written by,Original air date,Prod.code,UK viewers(millions),AI,Season,Doctor,Lead Actor
1,157,1,"""Rose""",Keith Boak,Russell T Davies,2005-03-26,1.1,10.81,76,1,Ninth Doctor,Christopher Eccleston
2,158,2,"""The End of the World""",Euros Lyn,Russell T Davies,2005-04-02,1.2,7.97,76,1,Ninth Doctor,Christopher Eccleston
3,159,3,"""The Unquiet Dead""",Euros Lyn,Mark Gatiss,2005-04-09,1.3,8.86,80,1,Ninth Doctor,Christopher Eccleston
4,160,4,"""Aliens of London""",Keith Boak,Russell T Davies,2005-04-16,1.4,7.63,82,1,Ninth Doctor,Christopher Eccleston
5,160,5,"""World War Three""",Keith Boak,Russell T Davies,2005-04-23,1.5,7.98,81,1,Ninth Doctor,Christopher Eccleston
...,...,...,...,...,...,...,...,...,...,...,...,...
162,293,7,"""Can You Hear Me?""",Emma Sullivan,Charlene James and Chris Chibnall,2020-02-09,,4.90,78,12,Thirteenth Doctor,Jodie Whittaker
163,294,8,"""The Haunting of Villa Diodati""",Emma Sullivan,Maxine Alderton,2020-02-16,,5.07,80,12,Thirteenth Doctor,Jodie Whittaker
164,295,9,"""Ascension of the Cybermen""",Jamie Magnus Stone,Chris Chibnall,2020-02-23,,4.99,81,12,Thirteenth Doctor,Jodie Whittaker
165,295,10,"""The Timeless Children""",Jamie Magnus Stone,Chris Chibnall,2020-03-01,,4.69,82,12,Thirteenth Doctor,Jodie Whittaker


Using **.groupby()** and **.stack()** we can check if double check the number of episodes per season.

In [39]:
df_rev[['Season','Episode']].groupby('Season').nunique().stack()

Season         
1       Episode    13
2       Episode    13
3       Episode    13
4       Episode    13
5       Episode    13
6       Episode    13
7       Episode    13
8       Episode    12
9       Episode    12
10      Episode    12
11      Episode    10
12      Episode    10
dtype: int64

In [40]:
df_rev.shape

(166, 12)

In [41]:
df_rev.dtypes

Story                            int32
Episode                          int32
Title                           string
Directed by                     string
Written by                      string
Original air date       datetime64[ns]
Prod.code                       object
UK viewers(millions)           float64
AI                               int32
Season                           int32
Doctor                          object
Lead Actor                      object
dtype: object

##### Optional: Saving dataframe as CSV  file

In [42]:
df_rev.to_csv("dr_who_revived.csv")

## Combining Dataframes.
The following portion will combine both dataframes from Classical and Revival Era.

For the following dataframe the episode title columns will be renamed to **"Episode Title"**.

In [43]:
df_clas.rename(columns={"Episode titles": "Episode Title"}, inplace=True)
df_rev.rename(columns={"Title":"Episode Title"},inplace=True)

In [44]:
dr_who = pd.concat([df_clas, df_rev], axis=0,sort=False)
dr_who

Unnamed: 0,Story,Serial,Episode,Serial title,Episode Title,Directed by,Written by,Original air date,Prod.code,UK viewers(millions),AI,Season,Doctor,Lead Actor
1,1,1.0,1,An Unearthly Child,"""An Unearthly Child""",Waris Hussein,Anthony Coburn and C. E. Webber (uncredited),1963-11-23,A,4.40,63,1,First Doctor,William Hartnell
2,1,1.0,2,An Unearthly Child,"""The Cave of Skulls""",Waris Hussein,Anthony Coburn,1963-11-30,A,5.90,59,1,First Doctor,William Hartnell
3,1,1.0,3,An Unearthly Child,"""The Forest of Fear""",Waris Hussein,Anthony Coburn,1963-12-07,A,6.90,56,1,First Doctor,William Hartnell
4,1,1.0,4,An Unearthly Child,"""The Firemaker""",Waris Hussein,Anthony Coburn,1963-12-14,A,6.40,55,1,First Doctor,William Hartnell
5,2,2.0,5,The Daleks,"""The Dead Planet""",Christopher Barry,Terry Nation,1963-12-21,B,6.90,59,1,First Doctor,William Hartnell
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
162,293,,7,,"""Can You Hear Me?""",Emma Sullivan,Charlene James and Chris Chibnall,2020-02-09,,4.90,78,12,Thirteenth Doctor,Jodie Whittaker
163,294,,8,,"""The Haunting of Villa Diodati""",Emma Sullivan,Maxine Alderton,2020-02-16,,5.07,80,12,Thirteenth Doctor,Jodie Whittaker
164,295,,9,,"""Ascension of the Cybermen""",Jamie Magnus Stone,Chris Chibnall,2020-02-23,,4.99,81,12,Thirteenth Doctor,Jodie Whittaker
165,295,,10,,"""The Timeless Children""",Jamie Magnus Stone,Chris Chibnall,2020-03-01,,4.69,82,12,Thirteenth Doctor,Jodie Whittaker


In [45]:
dr_who.dtypes

Story                            int64
Serial                         float64
Episode                          int64
Serial title                    string
Episode Title                   string
Directed by                     string
Written by                      string
Original air date       datetime64[ns]
Prod.code                       object
UK viewers(millions)           float64
AI                               int64
Season                           int32
Doctor                          object
Lead Actor                      object
dtype: object

In [46]:
dr_who[['Season','Episode']].groupby('Season').nunique().stack()

Season         
1       Episode    42
2       Episode    39
3       Episode    45
4       Episode    43
5       Episode    40
6       Episode    44
7       Episode    25
8       Episode    25
9       Episode    26
10      Episode    26
11      Episode    26
12      Episode    20
13      Episode    26
14      Episode    26
15      Episode    26
16      Episode    26
17      Episode    20
18      Episode    28
19      Episode    26
20      Episode    23
21      Episode    24
22      Episode    13
23      Episode    14
24      Episode    14
25      Episode    14
26      Episode    14
dtype: int64

In [47]:
#Resetting Index
dr_who.index = np.arange(1,len(dr_who)+1)

To show all rows run the following code.

In [48]:
#To reset rows option run pd.reset_option("max_rows")
pd.set_option("max_rows", None)
dr_who

Unnamed: 0,Story,Serial,Episode,Serial title,Episode Title,Directed by,Written by,Original air date,Prod.code,UK viewers(millions),AI,Season,Doctor,Lead Actor
1,1,1.0,1,An Unearthly Child,"""An Unearthly Child""",Waris Hussein,Anthony Coburn and C. E. Webber (uncredited),1963-11-23,A,4.4,63,1,First Doctor,William Hartnell
2,1,1.0,2,An Unearthly Child,"""The Cave of Skulls""",Waris Hussein,Anthony Coburn,1963-11-30,A,5.9,59,1,First Doctor,William Hartnell
3,1,1.0,3,An Unearthly Child,"""The Forest of Fear""",Waris Hussein,Anthony Coburn,1963-12-07,A,6.9,56,1,First Doctor,William Hartnell
4,1,1.0,4,An Unearthly Child,"""The Firemaker""",Waris Hussein,Anthony Coburn,1963-12-14,A,6.4,55,1,First Doctor,William Hartnell
5,2,2.0,5,The Daleks,"""The Dead Planet""",Christopher Barry,Terry Nation,1963-12-21,B,6.9,59,1,First Doctor,William Hartnell
6,2,2.0,6,The Daleks,"""The Survivors""",Christopher Barry,Terry Nation,1963-12-28,B,6.4,58,1,First Doctor,William Hartnell
7,2,2.0,7,The Daleks,"""The Escape""",Richard Martin,Terry Nation,1964-01-04,B,8.9,63,1,First Doctor,William Hartnell
8,2,2.0,8,The Daleks,"""The Ambush""",Christopher Barry,Terry Nation,1964-01-11,B,9.9,63,1,First Doctor,William Hartnell
9,2,2.0,9,The Daleks,"""The Expedition""",Christopher Barry,Terry Nation,1964-01-18,B,9.9,63,1,First Doctor,William Hartnell
10,2,2.0,10,The Daleks,"""The Ordeal""",Richard Martin,Terry Nation,1964-01-25,B,10.4,63,1,First Doctor,William Hartnell


#### Saving dataframe into a CSV.

In [49]:
dr_who.to_csv('dr_who_dataset.csv')

<h4>Author:  <a href="https://www.linkedin.com/in/sergey-khegay/">Sergey Khegay</a></h4>