## Hulu Streaming Data

### 1. Read the Data In
This is where I import any needed Python libraries and all datasets I'll use in this notebook.

In [1]:
# Import needed libraries
import pandas as pd

In [2]:
# Read the CSV file in
hulu = pd.read_csv("datasets/HuluViewingHistoryUpdated.csv")

In [3]:
# View the first five rows of data
hulu.head()

Unnamed: 0,Episode Name,Series Name,Season,Last Played At
0,I Know Who Did It,Only Murders in the Building,2.0,10/30/2022 22:18
1,Sparring Partners,Only Murders in the Building,2.0,10/30/2022 21:39
2,"Hello, Darkness",Only Murders in the Building,2.0,10/30/2022 21:03
3,Flipping the Pieces,Only Murders in the Building,2.0,10/30/2022 5:48
4,Performance Review,Only Murders in the Building,2.0,10/30/2022 5:12


### 2. Manipulate and Clean the Data

In [4]:
# Add a new column "Streaming Service" and fill with "Hulu" so once combined I will know which service this data is from
hulu["Streaming Service"] = "Hulu"

# View the first five rows of data to ensure that new column was added correctly
hulu.head()

Unnamed: 0,Episode Name,Series Name,Season,Last Played At,Streaming Service
0,I Know Who Did It,Only Murders in the Building,2.0,10/30/2022 22:18,Hulu
1,Sparring Partners,Only Murders in the Building,2.0,10/30/2022 21:39,Hulu
2,"Hello, Darkness",Only Murders in the Building,2.0,10/30/2022 21:03,Hulu
3,Flipping the Pieces,Only Murders in the Building,2.0,10/30/2022 5:48,Hulu
4,Performance Review,Only Murders in the Building,2.0,10/30/2022 5:12,Hulu


In [5]:
# Drop the columns that aren't needed
hulu = hulu.drop(columns=["Episode Name", "Season"], axis=1)

# View updated dataframe to make sure that columns were dropped
hulu.head()

Unnamed: 0,Series Name,Last Played At,Streaming Service
0,Only Murders in the Building,10/30/2022 22:18,Hulu
1,Only Murders in the Building,10/30/2022 21:39,Hulu
2,Only Murders in the Building,10/30/2022 21:03,Hulu
3,Only Murders in the Building,10/30/2022 5:48,Hulu
4,Only Murders in the Building,10/30/2022 5:12,Hulu


In [6]:
# Fix the column names in the dataframe
fixed_columns = {
    "Series Name":"Title",
    "Last Played At":"Date Watched"
}

# Check that the column names are displaying correctly
hulu.rename(columns=fixed_columns, inplace=True)
hulu.head()

Unnamed: 0,Title,Date Watched,Streaming Service
0,Only Murders in the Building,10/30/2022 22:18,Hulu
1,Only Murders in the Building,10/30/2022 21:39,Hulu
2,Only Murders in the Building,10/30/2022 21:03,Hulu
3,Only Murders in the Building,10/30/2022 5:48,Hulu
4,Only Murders in the Building,10/30/2022 5:12,Hulu


In [7]:
# Get info about the dataframe 
hulu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 427 entries, 0 to 426
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Title              426 non-null    object
 1   Date Watched       367 non-null    object
 2   Streaming Service  427 non-null    object
dtypes: object(3)
memory usage: 10.1+ KB


In [8]:
# Store the cleaned dataframe as a variable to use in the main notebook
hulu_cleaned = hulu
%store hulu_cleaned

Stored 'hulu_cleaned' (DataFrame)
