**IMPORT THE FILES**

In [2]:
import pandas as pd

df = pd.read_csv('netflix_titles.csv')
df.head()


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020.0,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021.0,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021.0,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021.0,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021.0,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


**UNDERSTAND THE DATA**

In [3]:
df.info()
df.describe(include='all')
df.shape


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5398 entries, 0 to 5397
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   show_id       5398 non-null   object 
 1   type          5398 non-null   object 
 2   title         5397 non-null   object 
 3   director      3515 non-null   object 
 4   cast          4903 non-null   object 
 5   country       4735 non-null   object 
 6   date_added    5397 non-null   object 
 7   release_year  5397 non-null   float64
 8   rating        5397 non-null   object 
 9   duration      5397 non-null   object 
 10  listed_in     5397 non-null   object 
 11  description   5397 non-null   object 
dtypes: float64(1), object(11)
memory usage: 506.2+ KB


(5398, 12)

**HANDLING THE MISSING VALUE**

In [4]:
# Check missing values
df.isnull().sum()

# Option 1: Drop rows with missing values (if few)
df.dropna(inplace=True)

# Option 2: Fill missing values (example: replace missing 'director' with 'Unknown')
df['director'].fillna('Unknown', inplace=True)
df['cast'].fillna('Not Available', inplace=True)
df['country'].fillna('Unknown', inplace=True)
df['date_added'].fillna(method='ffill', inplace=True)  # Forward fill example


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['director'].fillna('Unknown', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['cast'].fillna('Not Available', inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting valu

**REMOVE THE DUPLICATE DATA**

In [5]:
df.drop_duplicates(inplace=True)


**STANDARDIZE TEXT VALUE**

In [6]:
# Convert 'gender', 'country', etc., to lowercase (if applicable)
df['country'] = df['country'].str.lower().str.strip()
df['type'] = df['type'].str.lower()


**CONVERT DATE FORMAT**

In [7]:
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')


**RENAME THE COLUMN**

In [8]:
# Rename all columns to lowercase and replace spaces with underscores
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')


**CHECK AND FIX**

In [9]:
# Check data types
df.dtypes

# Convert 'release_year' to int (already should be), and 'date_added' to datetime
df['release_year'] = df['release_year'].astype('int')


**FINAL CLEAN VALUE**

In [10]:
df.info()
df.describe(include='all')
df.head()


<class 'pandas.core.frame.DataFrame'>
Index: 2964 entries, 7 to 5396
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   show_id       2964 non-null   object        
 1   type          2964 non-null   object        
 2   title         2964 non-null   object        
 3   director      2964 non-null   object        
 4   cast          2964 non-null   object        
 5   country       2964 non-null   object        
 6   date_added    2964 non-null   datetime64[ns]
 7   release_year  2964 non-null   int64         
 8   rating        2964 non-null   object        
 9   duration      2964 non-null   object        
 10  listed_in     2964 non-null   object        
 11  description   2964 non-null   object        
dtypes: datetime64[ns](1), int64(1), object(10)
memory usage: 301.0+ KB


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
7,s8,movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","united states, ghana, burkina faso, united kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,tv show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",united kingdom,2021-09-24,2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",united states,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...
12,s13,movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","germany, czech republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...
24,s25,movie,Jeans,S. Shankar,"Prashanth, Aishwarya Rai Bachchan, Sri Lakshmi...",india,2021-09-21,1998,TV-14,166 min,"Comedies, International Movies, Romantic Movies",When the father of the man she loves insists t...


**CLEANED DATA SET**

In [11]:
df.to_csv('netflix_titles_cleaned.csv', index=False)
