# **Importing Pandas**

In [1]:
import pandas as pd
import numpy as np

# **Pandas**

**Pandas** is a powerful **Python library for data manipulation and analysis.** It provides easy-to-use data structures and functions to work with structured data like tabular, time series, or matrix data.

**Pandas primarily provides two data structures: Series and DataFrame.**

**Series:** A one-dimensional labeled array capable of holding any data type.

**DataFrame:** A two-dimensional labeled data structure with columns of potentially different types.

# **Pandas - Series**

**Series** in pandas is a fundamental data structure that represents a one-dimensional array of indexed data. It can hold any type of **data—integers, strings, floats, Python objects**, etc. The Series object is built on top of the NumPy array and is very similar to it but with additional capabilities like handling missing data. The indices of a pandas Series are more flexible than those in a simple NumPy array.

# Creating a Series

In [2]:
s = pd.Series([1, 3, 5, 7, 9])
print(s)

0    1
1    3
2    5
3    7
4    9
dtype: int64


**Key Attributes**

**Values:** The data in the Series.

**Index:** The index (labels) of each data point.

# **Common Methods of Series**


# Descriptive Statistics

**s.describe():** Provides a quick summary of the data.

This method gives a statistical summary of the Series, including count, mean, standard deviation, minimum, maximum, and quartile values.

In [3]:
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])

# Descriptive statistics
print(s.describe())

count    5.000000
mean     5.000000
std      3.162278
min      1.000000
25%      3.000000
50%      5.000000
75%      7.000000
max      9.000000
dtype: float64


**s.mean():** Computes the mean of the data.

In [4]:
# Mean of the Series
print(s.mean())
print(s.median())

5.0
5.0


**s.std():** Computes the standard deviation.

In [5]:
# Standard deviation of the Series
print(s.std())

3.1622776601683795


**s.min() and s.max():** Computes the minimum and maximum values.

In [6]:
# Minimum and maximum values
print(s.min())
print(s.max())

1
9


# Data Manipulation

**s.map(func):** Applies a function to each element in the Series.

In [19]:
# Mapping function to double the values
doubled = s.map(lambda x: x * 2)
print(doubled)

0     2
1     6
2    10
3    14
4    18
dtype: int64


**s.apply(func):** Similar to map, but more flexible. (Can be used Data Frames as well, where as map is only for Series)

In [7]:
# Applying a function to calculate square root
sqrt = s.apply(lambda x: x ** 0.5)
print(sqrt)

0    1.000000
1    1.732051
2    2.236068
3    2.645751
4    3.000000
dtype: float64


**s.sort_values():** Sorts the Series.

In [8]:
# Sorting the Series
s=pd.Series([1,4,2,5,23,34])
sorted_s = s.sort_values()
print(sorted_s)

0     1
2     2
1     4
3     5
4    23
5    34
dtype: int64


**s.drop(labels):** Drops specified labels from the Series.

In [9]:
# Dropping the first element
dropped = s.drop(0) # 0 means droping first row/. becuase it only contains single column
print(dropped)

1     4
2     2
3     5
4    23
5    34
dtype: int64


In [10]:
print(s)

0     1
1     4
2     2
3     5
4    23
5    34
dtype: int64


In [11]:
s.drop(0,inplace=True)#inplace keyword modifies original S, 
print(s)

1     4
2     2
3     5
4    23
5    34
dtype: int64


# Handling Missing Data

**s.isnull():** Checks for missing values, returns a Series of booleans.

In [12]:
# Checking for missing values
print(s.isnull())

1    False
2    False
3    False
4    False
5    False
dtype: bool


**s.notnull():** Opposite of isnull().

In [13]:
# Checking for non-null values
print(s.notnull())

1    True
2    True
3    True
4    True
5    True
dtype: bool


**s.fillna(value):** Fills missing values with a specified value.

In [14]:
s=pd.Series([1,np.nan,2,np.nan,4,5])
print(s)


0    1.0
1    NaN
2    2.0
3    NaN
4    4.0
5    5.0
dtype: float64


In [15]:
# Create a Series with missing values
s = pd.Series([1, 2, np.nan, 4, np.nan])
# Print the Series
print(s)

0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
dtype: float64


In [16]:
# Filling missing values with 0
filled= s.fillna(0)#filling nan values with 0
print(filled)

0    1.0
1    2.0
2    0.0
3    4.0
4    0.0
dtype: float64


**s.dropna():** Drops all rows that contain missing values.

In [17]:
# Creating a Series with missing values
s_with_missing = pd.Series([1,np.nan,2,3,4,5,np.nan,np.nan,np.nan,9,10])
print(s_with_missing.isnull())
# Dropping missing values
dropped_missing=s_with_missing.dropna()#drop(0) removes that row, dropna() removes all null rows
print(dropped_missing)

0     False
1      True
2     False
3     False
4     False
5     False
6      True
7      True
8      True
9     False
10    False
dtype: bool
0      1.0
2      2.0
3      3.0
4      4.0
5      5.0
9      9.0
10    10.0
dtype: float64


# Indexing, Slicing, and Filtering

**s.iloc[ ]:** Purely integer-location based indexing.

In [18]:
# Indexing by position
s=pd.Series([1,3,2,1,5,6,8,99])
print("First position: ",s.iloc[0])###ineger based location., integerLocation-->iloc
print("Last position is : ",s.iloc[-1])
s=s[::-1]###reversing the dataSeries
print("reversed\n",s)

First position:  1
Last position is :  99
reversed
 7    99
6     8
5     6
4     5
3     1
2     2
1     3
0     1
dtype: int64


**s.loc[ ]:** Label-based indexing.

In [19]:
# Indexing by label
print(s.loc[0])  # First element this is label based index accessing
print(s.loc[7])  # Last element here 7 is label of that value

1
99


In [20]:
# Create a Series
s=pd.Series([10,20,30,40,50,60,70,80,90,100],index=['a','b','c','d','e','f','g','h','i','j'])
print(s)
##accesing using the iloc
print(s.iloc[2])
print(s.iloc[4:10:2])
##accessing with lables
print(s.loc['a'])
print(s.loc['d':'i':2])#slicing and step increment, last element in sling is included

a     10
b     20
c     30
d     40
e     50
f     60
g     70
h     80
i     90
j    100
dtype: int64
30
e    50
g    70
i    90
dtype: int64
10
d    40
f    60
h    80
dtype: int64


**s[s > n]:** Filters and returns elements greater than n.

In [21]:
#Filtering the elements greater than 50
filtered_data=s[s>50]
print(filtered_data)

f     60
g     70
h     80
i     90
j    100
dtype: int64


# Aggregation

**s.sum():** Sums up the values.

In [22]:
#sum of numbers
total=s.sum()
print(total)

550


**s.cumsum():** Cumulative sum.

In [23]:
# Cumulative sum of the Series prier elements are added
print(s)
cum_sum=s.cumsum()
print("cumulative sum is: ",cum_sum)

a     10
b     20
c     30
d     40
e     50
f     60
g     70
h     80
i     90
j    100
dtype: int64
cumulative sum is:  a     10
b     30
c     60
d    100
e    150
f    210
g    280
h    360
i    450
j    550
dtype: int64


**s.aggregate(func):** Aggregates using one or more operations.

In [25]:
# Aggregating using multiple operations
aggregate=s.aggregate(['sum','mean','median','std'])#multiple operations we can simultaneosly performed
print(aggregate)

sum       550.000000
mean       55.000000
median     55.000000
std        30.276504
dtype: float64


# Creating Data Frame

In [26]:
# Define data
employees_data={
    'name':['emp1','emp2','emp3','emp4','emp5','emp6','emp7','emp8','emp9','emp10'],
    'bg':['A+','B+','O+','AB+','A-','B-','O-','AB-','B+','A+'],
    'email':['emp1@gmail.com','emp2@gmail.com','emp3@gmail.com','emp4@gmail.com','emp5@gmail.com','emp6@gmail.com','emp7@gmail.com','emp8@gmail.com','emp9@gmail.com','emp10@gmail.com'],
    'mnumber':['9123456789','2222222222','333333333','444444444','5555555555','6666666666','777777777','888888888','999999999','1010101010'],
    'role':['tech','tech1','tech2','tech3','tech4','tech5','tech6','tech7','tech8','tech9']
    # index=['1','2','3','4','5','6','7','8','9','10']
}
##create a data frame
emp_data_frame=pd.DataFrame(employees_data)
print(emp_data_frame)

#Display DataFrame few rows
emp_data_frame.head(7)#if ,head()default value is 5 rows
emp_data_frame.describe()

    name   bg            email     mnumber   role
0   emp1   A+   emp1@gmail.com  9123456789   tech
1   emp2   B+   emp2@gmail.com  2222222222  tech1
2   emp3   O+   emp3@gmail.com   333333333  tech2
3   emp4  AB+   emp4@gmail.com   444444444  tech3
4   emp5   A-   emp5@gmail.com  5555555555  tech4
5   emp6   B-   emp6@gmail.com  6666666666  tech5
6   emp7   O-   emp7@gmail.com   777777777  tech6
7   emp8  AB-   emp8@gmail.com   888888888  tech7
8   emp9   B+   emp9@gmail.com   999999999  tech8
9  emp10   A+  emp10@gmail.com  1010101010  tech9


Unnamed: 0,name,bg,email,mnumber,role
count,10,10,10,10,10
unique,10,8,10,10,10
top,emp1,A+,emp1@gmail.com,9123456789,tech
freq,1,2,1,1,1


Concatenation:

In [27]:
data1 = {
  "name": ["Sally", "Mary", "John"],
  "age": [50, 40, 30]
}

data2 = {
  "name": ["Sally", "Peter", "Micky"],
  "age": [77, 44, 22]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

newdf = df1.merge(df2, how='outer',on='name')

print(newdf)
concate_df=pd.concat([df1,df2])
print(concate_df)

    name  age_x  age_y
0   John   30.0    NaN
1   Mary   40.0    NaN
2  Micky    NaN   22.0
3  Peter    NaN   44.0
4  Sally   50.0   77.0
    name  age
0  Sally   50
1   Mary   40
2   John   30
0  Sally   77
1  Peter   44
2  Micky   22


# Merging Data Frames

In [28]:
# Team roles data
roles_data = {
    'Name': ['Dodagatta Nihar', 'Vignesh', 'Maheshwar', 'Naman', 'Naveen', 'Shreya', 'Varsha', 'Varun'],
    'Role': ['Founder', 'Growth Manager', 'Community Manager', 'Community Manager', 'Community Manager',
             'Course Designer', 'Course Designer', 'Public Relations Manager']
}

roles_df = pd.DataFrame(roles_data)

# Contact information data
contact_data = {
    'Name': ['Dodagatta Nihar', 'Vignesh', 'Maheshwar', 'Naman', 'Naveen', 'Shreya', 'Varsha', 'Varun'],
    'Phone Number': ['111-111-1111', '222-222-2222', '333-333-3333', '444-444-4444',
                     '555-555-5555', '666-666-6666', '777-777-7777', '888-888-8888'],
    'Email': ['nihar@masscoders.tech', 'vignesh@masscoders.tech', 'maheshwar@masscoders.tech', 'naman@masscoders.tech',
              'naveen@masscoders.tech', 'shreya@masscoders.tech', 'varsha@masscoders.tech', 'varun@masscoders.tech']
}

contact_df = pd.DataFrame(contact_data)

In [29]:
merged_df = pd.merge(roles_df, contact_df,on='Name')
print(merged_df)

              Name                      Role  Phone Number  \
0  Dodagatta Nihar                   Founder  111-111-1111   
1          Vignesh            Growth Manager  222-222-2222   
2        Maheshwar         Community Manager  333-333-3333   
3            Naman         Community Manager  444-444-4444   
4           Naveen         Community Manager  555-555-5555   
5           Shreya           Course Designer  666-666-6666   
6           Varsha           Course Designer  777-777-7777   
7            Varun  Public Relations Manager  888-888-8888   

                       Email  
0      nihar@masscoders.tech  
1    vignesh@masscoders.tech  
2  maheshwar@masscoders.tech  
3      naman@masscoders.tech  
4     naveen@masscoders.tech  
5     shreya@masscoders.tech  
6     varsha@masscoders.tech  
7      varun@masscoders.tech  


# **Importing Dataset**

Importing datasets into Pandas is straightforward, and Pandas supports various file formats like csv, xlsx, json, sql etc.

In [30]:
df=pd.read_csv("D:/JupyterNotebooks/filmtv_movies.csv")

# Display the first few rows of the DataFrame to understand its structure and contents
df.head(5)

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0


The dataset contains information about movies, represented in a DataFrame structure.

**filmtv_id:** A unique identifier for each movie.

**title:** The title of the movie.

**year:** The release year of the movie.

**genre:** The genre of the movie.

**duration:** The duration of the movie in minutes.

**country:** The country where the movie was produced.

**directors:** Names of the directors of the movie.

**actors:** Names of the main actors in the movie.

**avg_vote, critics_vote, public_vote:** Average ratings from different sources.

**total_votes:** Total number of votes the movie received.

**description:** A short description of the movie plot.

**notes:** Additional notes or commentary about the movie.

**humor, rhythm, effort, tension, erotism:** Various attributes rated on a scale (probably from 0 to a maximum value, representing different aspects of the movie).

# **Pandas - DataFrame**

# Properties of DataFrame

**df.head(n):**
The df.head(n) method is used to view the first n rows of the DataFrame. This is particularly useful for getting a quick snapshot of the data, especially to understand the structure and the types of data contained in each column. If you don't specify n, the default number of rows displayed is 5.

In [31]:
df.head(10)  # Displays the first 10 rows of the DataFrame

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0
5,21,The Uranian Conspiracy,1978,Spy,117,"Italy, Germany, Israel","Gianfranco Baldanello, Menahem Golan","Fabio Testi, Janet Agren, Assaf Dayan, Siegfri...",4.8,3.5,6.0,5,Two Israeli secret agents discover that traffi...,"Action and chases for half of Europe, espionag...",1,2,0,2,0
6,22,A ciascuno il suo,1967,Drama,93,Italy,Elio Petri,"Gian Maria Volonté, Irene Papas, Gabriele Ferz...",7.6,7.68,7.0,139,Investigations into two murders committed in a...,"Champion of the cinema of civil commitment, El...",0,2,3,3,1
7,23,Dead-Bang,1989,Crime,109,United States,John Frankenheimer,"Don Johnson, Penelope Ann Miller, William Fors...",6.0,6.0,6.0,27,"In the throes of a double murder, Jerry Beck, ...",When it comes to talking about mysterious plot...,0,2,0,2,1
8,24,A... come assassino,1966,Thriller,80,Italy,Ray Morrison (Angelo Dorigo),"Alan Steel, Mary Arden, Sergio Ciani, Ivano Da...",5.2,3.0,7.0,5,After a man's corpse is found by his niece in ...,Approximation and mediocrity in go-go.,1,2,0,1,0
9,26,At Close Range,1986,Drama,115,United States,James Foley,"Christopher Walken, Sean Penn, Chris Penn, Mar...",7.5,7.64,7.0,90,Young Brad (Penn) lives with his grandmother a...,"Powerful and brutal thriller, second work by J...",1,3,2,4,2


**df.tail(n):**
The df.tail(n) method is similar to df.head(n) but for the end of the DataFrame. It returns the last n rows. This is useful to see the most recent or the last few entries in your data, depending on the ordering of your dataset. Like df.head(n), the default value of n is 5 if it isn't specified.

In [32]:
df.tail(10)  # Displays the last 10 rows of the DataFrame

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
41389,232184,Nowhere,2023,Thriller,109,Spain,Albert Pintó,"Anna Castillo, Tamar Novas",5.3,5.25,5.0,10,"Mia, pregnant, flees together with her husband...",,0,0,0,0,0
41390,232203,Mes petites amoureuses,1974,Drama,123,France,Jean Eustache,"Martin Loeb, Jacqueline Dufranne, Jacques Roma...",8.3,8.5,8.0,3,Daniel is a silent boy who observes girls with...,,0,0,0,0,0
41391,232755,Organ Trail,2023,Western,112,United States,Michael Patrick Jann,"Zoé De Grand Maison, Mather Zickel, Lisa LoCic...",5.0,5.5,5.0,8,The young abigale Archer is located alone in M...,,0,0,0,0,0
41392,232757,Hidden Family Secrets,2021,Thriller,87,Canada,Stefan Brogren,"Alex Paxton-Beesley, Madelyn Keys, Sonja Smits...",5.3,,5.0,3,"While struggling to try to save her daughter, ...",,0,0,0,0,0
41393,232816,La stoccata vincente,2023,Biography,107,Italy,Nicola Campiotti,"Alessio Vassallo, Flavio Insinna, Elena Funari...",4.5,,5.0,4,The true story of the prodigy of fencing Paolo...,Freely inspired by the book The winning fool b...,0,0,0,0,0
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,4.0,3,Celeste is an attractive waitress in the forti...,Freely taken from a true story.,0,0,0,0,0
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,3.0,3,"When Eleonora is downloaded to the altar, her ...",Bachelorette fare sequel (2021).,0,0,0,0,0
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,6.0,6,A team building conference organized for a gro...,,0,0,0,0,0
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,6.0,5,Once Ok-Ju worked as a bodyguard and was one o...,,0,0,0,0,0
41398,232920,Invitación a un Asesinato,2023,Thriller,92,Mexico,J.M Cravioto,"Maribel Verdú, Stephanie Cayo, Manolo Cardona,...",6.0,,6.0,3,A great passionate of crime stories is trapped...,,0,0,0,0,0


**df.shape:**
The df.shape attribute of a DataFrame returns a tuple representing the dimensionality of the DataFrame. The first element of the tuple is the number of rows, and the second is the number of columns. This is useful when you need to know how large the dataset is, such as when you are preprocessing data or ensuring that data manipulations have executed correctly.

In [33]:
df.shape  # Outputs: (number of rows, number of columns)

(41399, 19)

**df.columns:**
The df.columns attribute returns an Index object containing the column labels of the DataFrame. Knowing the column names is essential for accessing specific data in the DataFrame, performing analyses, and for data manipulation tasks like sorting, filtering, or applying functions to certain columns.

In [34]:
df.columns  # Lists all the column names in the DataFrame

Index(['filmtv_id', 'title', 'year', 'genre', 'duration', 'country',
       'directors', 'actors', 'avg_vote', 'critics_vote', 'public_vote',
       'total_votes', 'description', 'notes', 'humor', 'rhythm', 'effort',
       'tension', 'erotism'],
      dtype='object')

**Inspecting Data Types:** Each column in a DataFrame has a specific data type. Understanding these types is crucial for proper data manipulation

In [35]:
# Display the data types of each column
df.dtypes

filmtv_id         int64
title            object
year              int64
genre            object
duration          int64
country          object
directors        object
actors           object
avg_vote        float64
critics_vote    float64
public_vote     float64
total_votes       int64
description      object
notes            object
humor             int64
rhythm            int64
effort            int64
tension           int64
erotism           int64
dtype: object

**Summary Statistics:** For numerical data, it's useful to get a sense of their central tendency and spread

In [36]:
# Display summary statistics for numerical columns
df.describe()

Unnamed: 0,filmtv_id,year,duration,avg_vote,critics_vote,public_vote,total_votes,humor,rhythm,effort,tension,erotism
count,41399.0,41399.0,41399.0,41399.0,36703.0,41205.0,41399.0,41399.0,41399.0,41399.0,41399.0,41399.0
mean,57746.410179,1993.505302,100.537163,5.801522,5.796077,5.924135,36.986763,0.577381,1.345347,0.684847,0.919153,0.309814
std,59962.09573,23.685612,27.260962,1.403861,1.593062,1.480112,69.386853,0.899402,1.154829,1.112334,1.09541,0.64593
min,2.0,1897.0,41.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,15857.0,1976.0,90.0,4.8,4.67,5.0,5.0,0.0,0.0,0.0,0.0,0.0
50%,36266.0,2001.0,96.0,5.9,6.0,6.0,12.0,0.0,2.0,0.0,0.0,0.0
75%,70935.0,2013.0,107.0,6.9,7.0,7.0,36.0,1.0,2.0,1.0,2.0,0.0
max,232920.0,2023.0,1525.0,10.0,10.0,10.0,1082.0,5.0,5.0,5.0,5.0,5.0


# Accessing and Filtering:

**df.loc:**
The df.loc method is used for label-based indexing, meaning you can access rows and columns using their labels (i.e., index names and column names). It allows for selecting a subset of rows and columns from a DataFrame with powerful and flexible slicing, indexing, and filtering options.

In [37]:
df.head(5)

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0


In [38]:
# Selecting all rows and a specific column by label
titles=df.loc[:,'title']
print(titles)

0        Bugs Bunny's Third Movie: 1001 Rabbit Tales
1                          18 anni tra una settimana
2                                   Ride a Wild Pony
3                                              Diner
4                    A che servono questi quattrini?
                            ...                     
41394                             Gold Digger Killer
41395                            Addio al nubilato 2
41396                                    Konferensen
41397                                      Ballelina
41398                      Invitación a un Asesinato
Name: title, Length: 41399, dtype: object


In [39]:
# Selecting a range of rows and multiple columns by labels
subset=df.loc[10:20,['title','year','genre']]
print(subset)
print(df.columns)

                                   title  year        genre
10             A Ghentar si muore facile  1968    Adventure
11               Sleeping with the Enemy  1990        Drama
12                   In Bed With Madonna  1990  Documentary
13                    Bowery at Midnight  1942       Horror
14  A mezzanotte va la ronda del piacere  1975       Comedy
15                          Mr. Majestyk  1974       Action
17                      About Last Night  1986       Comedy
18                             Fail-Safe  1964        Drama
19                      Some Like It Hot  1959       Comedy
20                    A qualsiasi prezzo  1968    Adventure
Index(['filmtv_id', 'title', 'year', 'genre', 'duration', 'country',
       'directors', 'actors', 'avg_vote', 'critics_vote', 'public_vote',
       'total_votes', 'description', 'notes', 'humor', 'rhythm', 'effort',
       'tension', 'erotism'],
      dtype='object')


In [40]:
# Conditional selection using a boolean array
dramas=df.loc[df['genre']=='Drama']
print(dramas)

       filmtv_id                      title  year  genre  duration  \
1              3  18 anni tra una settimana  1991  Drama        98   
6             22          A ciascuno il suo  1967  Drama        93   
9             26             At Close Range  1986  Drama       115   
11            32    Sleeping with the Enemy  1990  Drama        96   
18            49                  Fail-Safe  1964  Drama       110   
...          ...                        ...   ...    ...       ...   
41368     229838           Tereddüt Çizgisi  2023  Drama        84   
41370     229865                     Stolen  2023  Drama        92   
41371     229881                       Árni  2023  Drama       103   
41372     229883              Kanata no uta  2023  Drama        84   
41390     232203     Mes petites amoureuses  1974  Drama       123   

                              country      directors  \
1                               Italy  Luigi Perelli   
6                               Italy     Elio 

In [41]:
multiple_condition = df.loc[(df['genre']=='Drama')& (df['avg_vote']>7.0)]
print(multiple_condition)

       filmtv_id                     title  year  genre  duration  \
6             22         A ciascuno il suo  1967  Drama        93   
9             26            At Close Range  1986  Drama       115   
18            49                 Fail-Safe  1964  Drama       110   
30            70  Can You Feel Me Dancing?  1986  Drama       120   
44            92                 Accattone  1961  Drama       120   
...          ...                       ...   ...    ...       ...   
41320     226639                Sambizanga  1972  Drama        97   
41349     229784                The Burial  2023  Drama       126   
41355     229808      Aku wa sonzai shinai  2023  Drama       106   
41366     229832       Magyarázat mindenre  2023  Drama       151   
41390     232203    Mes petites amoureuses  1974  Drama       123   

                 country            directors  \
6                  Italy           Elio Petri   
9          United States          James Foley   
18         United States

**df.iloc:**
While df.loc uses labels for indexing, df.iloc allows for integer-based indexing. You use df.iloc to access rows and columns by their integer positions, which makes it useful when you need to access data by its position in the DataFrame.

In [42]:
df.head(5)

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0


In [43]:
# Selecting a single row from the DataFrame
single_row = df.iloc[0]
print(single_row)

filmtv_id                                                       2
title                 Bugs Bunny's Third Movie: 1001 Rabbit Tales
year                                                         1982
genre                                                   Animation
duration                                                       76
country                                             United States
directors                    David Detiege, Art Davis, Bill Perez
actors                                                        NaN
avg_vote                                                      7.7
critics_vote                                                  8.0
public_vote                                                   7.0
total_votes                                                    22
description     With two protruding front teeth, a slightly sl...
notes           These are many small independent stories, whic...
humor                                                           3
rhythm    

In [44]:
# Selecting a specific row and columns by integer indices
specific_data=df.iloc[10,[0,1,2,3]]
print(specific_data)

filmtv_id                           30
title        A Ghentar si muore facile
year                              1968
genre                        Adventure
Name: 10, dtype: object


In [45]:
# Slicing to get multiple rows and columns
multi_slice = df.iloc[10:15,0:4]  # Rows 10 to 14 and columns 0 to 3
print(multi_slice)

    filmtv_id                                 title  year        genre
10         30             A Ghentar si muore facile  1968    Adventure
11         32               Sleeping with the Enemy  1990        Drama
12         34                   In Bed With Madonna  1990  Documentary
13         36                    Bowery at Midnight  1942       Horror
14         37  A mezzanotte va la ronda del piacere  1975       Comedy


**df.at:**
df.at is designed to access a single value for a row/column label pair. It is very similar to df.loc for accessing scalar values but is optimized for faster access when you only need to get or set a single value in a DataFrame.

In [46]:
# Access a specific single value using row label and column name
title_of_first_movie = df.at[0, 'title']
print(title_of_first_movie)
print(df.at[10,'genre'])

Bugs Bunny's Third Movie: 1001 Rabbit Tales
Adventure


**Filtering Based on Criteria:**
Filtering data based on specific criteria is a common operation in data analysis. Pandas provides several methods to perform these operations, often using boolean indexing.

In [47]:
# Filter movies released after 2010
recent_movies=df.loc[df['year']>2010]
print(recent_movies)

       filmtv_id                      title  year      genre  duration  \
22817      39955           The Tree of Life  2011      Drama       138   
23725      41448        Season of the Witch  2011  Adventure        95   
24186      42398                   Restless  2011      Drama        95   
24426      42847             Qualunquemente  2011  Grotesque        96   
24427      42848  Una sconfinata giovinezza  2011      Drama        98   
...          ...                        ...   ...        ...       ...   
41394     232817         Gold Digger Killer  2021   Thriller        87   
41395     232893        Addio al nubilato 2  2023     Comedy        90   
41396     232915                Konferensen  2023     Horror       100   
41397     232919                  Ballelina  2023   Thriller        92   
41398     232920  Invitación a un Asesinato  2023   Thriller        92   

                     country           directors  \
22817          United States     Terrence Malick   
23725  

In [48]:
# Movies with a high public vote and specific genre
highly_rated_thrillers = df[(df['public_vote']>=8)&(df['genre'] == 'Thriller')]
print(highly_rated_thrillers)

       filmtv_id                  title  year     genre  duration  \
21            54         Johnny O'Clock  1947  Thriller        95   
27            67  You'll Like My Mother  1973  Thriller        94   
140          236      Time Without Pity  1956  Thriller        88   
183          296   Strangers on a Train  1951  Thriller        96   
302          478               Gaslight  1944  Thriller       109   
...          ...                    ...   ...       ...       ...   
40950     216918              Aru otoko  2022  Thriller       121   
41118     220183        The Other Child  2022  Thriller       114   
41200     221403   Anatomie d'une chute  2023  Thriller       151   
41218     221890                Ming On  2022  Thriller       109   
41246     223251          Twisted Nerve  1968  Thriller       112   

             country         directors  \
21     United States     Robert Rossen   
27     United States    Lamont Johnson   
140    Great Britain      Joseph Losey   
183

In [49]:
# Movies from a specific country
us_movies = df[df['country'] == 'United States']
print(us_movies)

       filmtv_id                                          title  year  \
0              2    Bugs Bunny's Third Movie: 1001 Rabbit Tales  1982   
2             17                               Ride a Wild Pony  1976   
3             18                                          Diner  1982   
7             23                                      Dead-Bang  1989   
9             26                                 At Close Range  1986   
...          ...                                            ...   ...   
41377     231044                                        Shelter  2023   
41382     231556  To End All War: Oppenheimer & the Atomic Bomb  2023   
41384     232066                                  Hostage House  2021   
41385     232101                                 Fear the Night  2023   
41391     232755                                    Organ Trail  2023   

             genre  duration        country  \
0        Animation        76  United States   
2         Romantic        91 

# Updating Rows and Columns

**df.drop:**
The .drop() method in pandas is used to remove rows or columns from a DataFrame. Its primary purpose is to drop specified labels from rows or columns.

**Parameters:**

**labels:** The row or column labels to drop.

**axis:** Specifies whether the labels refer to rows (axis=0) or columns (axis=1). By default, it's 0 (rows).

**index or columns:** An alternative way to specify the labels to drop, instead of using the labels parameter. It is equivalent to specifying axis=0 (for index) or axis=1 (for columns).

**inplace:** If True, the operation is done in place, meaning it modifies the DataFrame directly and returns None. If False or not specified, it returns a new DataFrame with the specified labels dropped.

In [53]:
new_df=df.drop(labels=['title','year'],axis=1)#we can drop multiple colummns mentioning axis=1
print(new_df)

       filmtv_id      genre  duration                country  \
0              2  Animation        76          United States   
1              3      Drama        98                  Italy   
2             17   Romantic        91          United States   
3             18     Comedy        95          United States   
4             20     Comedy        85                  Italy   
...          ...        ...       ...                    ...   
41394     232817   Thriller        87  Canada, United States   
41395     232893     Comedy        90                  Italy   
41396     232915     Horror       100                 Sweden   
41397     232919   Thriller        92            South Korea   
41398     232920   Thriller        92                 Mexico   

                                  directors  \
0      David Detiege, Art Davis, Bill Perez   
1                             Luigi Perelli   
2                               Don Chaffey   
3                            Barry Levinson

In [57]:
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,4.0,3,Celeste is an attractive waitress in the forti...,Freely taken from a true story.,0,0,0,0,0
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,3.0,3,"When Eleonora is downloaded to the altar, her ...",Bachelorette fare sequel (2021).,0,0,0,0,0
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,6.0,6,A team building conference organized for a gro...,,0,0,0,0,0
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,6.0,5,Once Ok-Ju worked as a bodyguard and was one o...,,0,0,0,0,0


**Direct Assignment:**
Directly assign a value to a specific column or even a cell in a DataFrame.

In [58]:
df.at[0,'year']=1983
df.head()

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0


In [None]:
df['new_column'] = 'default value'  # Adds a new column with all entries set to 'default value'
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism,new_column
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0,default value
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,default value
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,default value
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,default value
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0,default value
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,4.0,3,Celeste is an attractive waitress in the forti...,Freely taken from a true story.,0,0,0,0,0,default value
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,3.0,3,"When Eleonora is downloaded to the altar, her ...",Bachelorette fare sequel (2021).,0,0,0,0,0,default value
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,6.0,6,A team building conference organized for a gro...,,0,0,0,0,0,default value
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,6.0,5,Once Ok-Ju worked as a bodyguard and was one o...,,0,0,0,0,0,default value


In [59]:
#creating a new_coulmn
df['new_column']='default_value'
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism,new_column
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0,default_value
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,default_value
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,default_value
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,default_value
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0,default_value
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,4.0,3,Celeste is an attractive waitress in the forti...,Freely taken from a true story.,0,0,0,0,0,default_value
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,3.0,3,"When Eleonora is downloaded to the altar, her ...",Bachelorette fare sequel (2021).,0,0,0,0,0,default_value
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,6.0,6,A team building conference organized for a gro...,,0,0,0,0,0,default_value
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,6.0,5,Once Ok-Ju worked as a bodyguard and was one o...,,0,0,0,0,0,default_value


In [66]:
df.drop(labels=['new_column'],axis=1,inplace=True)
df.head()

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism,classic
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0,True
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,True
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,True
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,True
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0,True


In [67]:
df.head(5)

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism,classic
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.0,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0,True
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.0,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,True
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.0,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,True
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.0,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,True
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0,True


**Using loc for Conditional Updates:**
loc can be used to update rows and columns based on a condition.

In [68]:
# Marks movies before 2000 as classic
df.loc[df['year']<2000,'classic'] = True
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,public_vote,total_votes,description,notes,humor,rhythm,effort,tension,erotism,classic
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,7.0,22,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0,True
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,7.0,4,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,True
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,5.0,10,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,True
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,6.0,18,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,True
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,7.0,15,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,4.0,3,Celeste is an attractive waitress in the forti...,Freely taken from a true story.,0,0,0,0,0,
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,3.0,3,"When Eleonora is downloaded to the altar, her ...",Bachelorette fare sequel (2021).,0,0,0,0,0,
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,6.0,6,A team building conference organized for a gro...,,0,0,0,0,0,
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,6.0,5,Once Ok-Ju worked as a bodyguard and was one o...,,0,0,0,0,0,


In [77]:
#filtering the data
filtered_data=df.loc[:,['title','year']]
filtered_data

Unnamed: 0,title,year
0,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983
1,18 anni tra una settimana,1991
2,Ride a Wild Pony,1976
3,Diner,1982
4,A che servono questi quattrini?,1942
...,...,...
41394,Gold Digger Killer,2021
41395,Addio al nubilato 2,2023
41396,Konferensen,2023
41397,Ballelina,2023


In [80]:
# Modifying multiple columns using loc, creating new columns too
df.loc[df['avg_vote']>6,['top_rated','must_watch']]=[True,True]

In [81]:
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,description,notes,humor,rhythm,effort,tension,erotism,classic,top_rated,must_watch
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,"With two protruding front teeth, a slightly sl...","These are many small independent stories, whic...",3,3,0,0,0,True,True,True
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,"Samantha, not yet eighteen, leaves the comfort...","Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,True,True,True
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,"In the Australia of the pioneers, a boy and a ...","""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,True,,
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,Five boys from Baltimore have a habit of meeti...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,True,True,True
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,"With a stratagem, the penniless and somewhat p...",Taken from the play by Armando Curcio that the...,3,1,1,0,0,True,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,Celeste is an attractive waitress in the forti...,Freely taken from a true story.,0,0,0,0,0,,,
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,"When Eleonora is downloaded to the altar, her ...",Bachelorette fare sequel (2021).,0,0,0,0,0,,,
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,A team building conference organized for a gro...,,0,0,0,0,0,,,
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,Once Ok-Ju worked as a bodyguard and was one o...,,0,0,0,0,0,,,


**Using apply Function:**
The apply function allows you to apply a function along an axis of the DataFrame.

In [82]:
df['length_category'] = df['duration'].apply(lambda x: 'Long' if x>120 else 'Short')
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,notes,humor,rhythm,effort,tension,erotism,classic,top_rated,must_watch,length_category
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,"These are many small independent stories, whic...",3,3,0,0,0,True,True,True,Short
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,"Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,True,True,True,Short
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,"""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,True,,,Short
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,True,True,True,Short
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,Taken from the play by Armando Curcio that the...,3,1,1,0,0,True,,,Short
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,Freely taken from a true story.,0,0,0,0,0,,,,Short
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,Bachelorette fare sequel (2021).,0,0,0,0,0,,,,Short
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,,0,0,0,0,0,,,,Short
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,,0,0,0,0,0,,,,Short


In [86]:
# Create a DataFrame with multiple Series
data = {
    'A':[1,2,3],
    'B':[4,5,6],
    'C':[7,8,9]
}
num_data=pd.DataFrame(data)

In [87]:
num_data

Unnamed: 0,A,B,C
0,1,4,7
1,2,5,8
2,3,6,9


In [88]:
# Define a function to sum two Series
def sum_series(x,y):
    return x+y

# Apply the function on multiple Series using apply()
result = num_data.apply(lambda x: sum_series(x['A'],x['B']),axis=1)

# Print the result
print(result)

0    5
1    7
2    9
dtype: int64


**Updating Using map or replace:**
You can update a column based on a mapping dictionary or replace values.

In [91]:
df['genre'].map({'Drama':'Drama Film','Comedy':'Comedy Film'}) # Mapping existing values to new ones

0                NaN
1         Drama Film
2                NaN
3        Comedy Film
4        Comedy Film
            ...     
41394            NaN
41395    Comedy Film
41396            NaN
41397            NaN
41398            NaN
Name: genre, Length: 41399, dtype: object

In [97]:
df['country'].replace('USA','United States',inplace=True) # Replacing specific values
df

##another method using map
# df['country'].map({'USA':'United States'})
# df

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['country'].replace('USA','United States',inplace=True) # Replacing specific values


Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,notes,humor,rhythm,effort,tension,erotism,classic,top_rated,must_watch,length_category
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,"These are many small independent stories, whic...",3,3,0,0,0,True,True,True,Short
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,"Luigi Perelli, the director of the ""Piovra"", o...",0,2,0,2,0,True,True,True,Short
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,"""Ecological"" story with a happy ending, not wi...",1,2,1,0,0,True,,,Short
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,A cast of will be famous for Levinson's direct...,2,2,0,1,2,True,True,True,Short
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,Taken from the play by Armando Curcio that the...,3,1,1,0,0,True,,,Short
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,Freely taken from a true story.,0,0,0,0,0,,,,Short
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,Bachelorette fare sequel (2021).,0,0,0,0,0,,,,Short
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,,0,0,0,0,0,,,,Short
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,,0,0,0,0,0,,,,Short


**Adding New Columns Based on Calculations:**
You can create new columns based on calculations from existing columns.

In [98]:
df['title_year']=df['title']+"( "+df['year'].astype(str)+")" # Creating a new column by combining existing columns
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,humor,rhythm,effort,tension,erotism,classic,top_rated,must_watch,length_category,title_year
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,3,3,0,0,0,True,True,True,Short,Bugs Bunny's Third Movie: 1001 Rabbit Tales( 1...
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,0,2,0,2,0,True,True,True,Short,18 anni tra una settimana( 1991)
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,1,2,1,0,0,True,,,Short,Ride a Wild Pony( 1976)
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,2,2,0,1,2,True,True,True,Short,Diner( 1982)
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,3,1,1,0,0,True,,,Short,A che servono questi quattrini?( 1942)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,0,0,0,0,0,,,,Short,Gold Digger Killer( 2021)
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,0,0,0,0,0,,,,Short,Addio al nubilato 2( 2023)
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,0,0,0,0,0,,,,Short,Konferensen( 2023)
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,0,0,0,0,0,,,,Short,Ballelina( 2023)


**Using assign to Create Columns:**
assign helps you add new columns to a DataFrame in a functional style.

In [101]:
df=df.assign(
    is_older = lambda x:x['year']<2000,
    duration_hours = lambda x: x['duration'] /60
)# Adding multiple new columns

In [102]:
df

Unnamed: 0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,effort,tension,erotism,classic,top_rated,must_watch,length_category,title_year,is_older,duration_hours
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,0,0,0,True,True,True,Short,Bugs Bunny's Third Movie: 1001 Rabbit Tales( 1...,True,1.266667
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,0,2,0,True,True,True,Short,18 anni tra una settimana( 1991),True,1.633333
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,1,0,0,True,,,Short,Ride a Wild Pony( 1976),True,1.516667
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,0,1,2,True,True,True,Short,Diner( 1982),True,1.583333
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,1,0,0,True,,,Short,A che servono questi quattrini?( 1942),True,1.416667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,0,0,0,,,,Short,Gold Digger Killer( 2021),False,1.450000
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,0,0,0,,,,Short,Addio al nubilato 2( 2023),False,1.500000
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,0,0,0,,,,Short,Konferensen( 2023),False,1.666667
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,0,0,0,,,,Short,Ballelina( 2023),False,1.533333


# Changing the name of Index
Pandas allows you to rename the index of a DataFrame or Series, which can help in making the index more informative or aligning it with new data requirements.

**Renaming the Index of a DataFrame:**

In [105]:
df.index.names=['movie_id']  # Renames the index to 'movie_id'

In [106]:
df

Unnamed: 0_level_0,filmtv_id,title,year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,effort,tension,erotism,classic,top_rated,must_watch,length_category,title_year,is_older,duration_hours
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,0,0,0,True,True,True,Short,Bugs Bunny's Third Movie: 1001 Rabbit Tales( 1...,True,1.266667
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,0,2,0,True,True,True,Short,18 anni tra una settimana( 1991),True,1.633333
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,1,0,0,True,,,Short,Ride a Wild Pony( 1976),True,1.516667
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,0,1,2,True,True,True,Short,Diner( 1982),True,1.583333
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,1,0,0,True,,,Short,A che servono questi quattrini?( 1942),True,1.416667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,0,0,0,,,,Short,Gold Digger Killer( 2021),False,1.450000
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,0,0,0,,,,Short,Addio al nubilato 2( 2023),False,1.500000
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,0,0,0,,,,Short,Konferensen( 2023),False,1.666667
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,0,0,0,,,,Short,Ballelina( 2023),False,1.533333


**Renaming Column Indexes:**

In [109]:
df.rename(columns = {'year': 'release_year','title':'movie_title'},inplace=True)
df

Unnamed: 0_level_0,filmtv_id,movie_title,release_year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,effort,tension,erotism,classic,top_rated,must_watch,length_category,title_year,is_older,duration_hours
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,0,0,0,True,True,True,Short,Bugs Bunny's Third Movie: 1001 Rabbit Tales( 1...,True,1.266667
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,0,2,0,True,True,True,Short,18 anni tra una settimana( 1991),True,1.633333
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,1,0,0,True,,,Short,Ride a Wild Pony( 1976),True,1.516667
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,0,1,2,True,True,True,Short,Diner( 1982),True,1.583333
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,1,0,0,True,,,Short,A che servono questi quattrini?( 1942),True,1.416667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,0,0,0,,,,Short,Gold Digger Killer( 2021),False,1.450000
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,0,0,0,,,,Short,Addio al nubilato 2( 2023),False,1.500000
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,0,0,0,,,,Short,Konferensen( 2023),False,1.666667
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,0,0,0,,,,Short,Ballelina( 2023),False,1.533333


# Display Options

In [111]:
# Set maximum number of rows and columns to display
pd.set_option('display.max_rows', 7)
pd.set_option('display.max_columns', 5)

In [None]:
df

Unnamed: 0_level_0,movie_title,release_year,...,is_older,duration_hours
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1982,...,True,1.266667
3,18 anni tra una settimana,1991,...,True,1.633333
17,Ride a Wild Pony,1976,...,True,1.516667
...,...,...,...,...,...
232915,Konferensen,2023,...,False,1.666667
232919,Ballelina,2023,...,False,1.533333
232920,Invitación a un Asesinato,2023,...,False,1.533333


In [112]:
# Reset Options
pd.reset_option('display')

In [113]:
df

Unnamed: 0_level_0,filmtv_id,movie_title,release_year,genre,duration,country,directors,actors,avg_vote,critics_vote,...,effort,tension,erotism,classic,top_rated,must_watch,length_category,title_year,is_older,duration_hours
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2,Bugs Bunny's Third Movie: 1001 Rabbit Tales,1983,Animation,76,United States,"David Detiege, Art Davis, Bill Perez",,7.7,8.00,...,0,0,0,True,True,True,Short,Bugs Bunny's Third Movie: 1001 Rabbit Tales( 1...,True,1.266667
1,3,18 anni tra una settimana,1991,Drama,98,Italy,Luigi Perelli,"Kim Rossi Stuart, Simona Cavallari, Ennio Fant...",6.5,6.00,...,0,2,0,True,True,True,Short,18 anni tra una settimana( 1991),True,1.633333
2,17,Ride a Wild Pony,1976,Romantic,91,United States,Don Chaffey,"Michael Craig, John Meillon, Eva Griffith, Gra...",5.7,6.00,...,1,0,0,True,,,Short,Ride a Wild Pony( 1976),True,1.516667
3,18,Diner,1982,Comedy,95,United States,Barry Levinson,"Mickey Rourke, Steve Guttenberg, Ellen Barkin,...",7.0,8.00,...,0,1,2,True,True,True,Short,Diner( 1982),True,1.583333
4,20,A che servono questi quattrini?,1942,Comedy,85,Italy,Esodo Pratelli,"Eduardo De Filippo, Peppino De Filippo, Clelia...",5.9,5.33,...,1,0,0,True,,,Short,A che servono questi quattrini?( 1942),True,1.416667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41394,232817,Gold Digger Killer,2021,Thriller,87,"Canada, United States",Robin Hays,"Julie Benz, Roan Curtis, Georgia Bradner, Eli ...",4.0,,...,0,0,0,,,,Short,Gold Digger Killer( 2021),False,1.450000
41395,232893,Addio al nubilato 2,2023,Comedy,90,Italy,Francesco Apolloni,"Laura Chiatti, Chiara Francini, Antonia Liskov...",2.7,,...,0,0,0,,,,Short,Addio al nubilato 2( 2023),False,1.500000
41396,232915,Konferensen,2023,Horror,100,Sweden,Patrik Eklund,"Katia Winter, Eva Melander, Lola Zackow, Adam ...",6.0,,...,0,0,0,,,,Short,Konferensen( 2023),False,1.666667
41397,232919,Ballelina,2023,Thriller,92,South Korea,Chung-Hyun Lee,"Jeon Jong-seo, Park Yu-rim, Ji-hun Kim",5.8,,...,0,0,0,,,,Short,Ballelina( 2023),False,1.533333


# Grouping Data:
Grouping data is a powerful way to perform segment-wise analysis and break down the dataset into chunks based on some criteria.

In [114]:
genre_groups = df.groupby('genre')  # Groups the data by the 'genre' column
genre_groups

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000218A94A5160>

In [115]:
for genre, group_data in genre_groups:
    print(f"Genre: {genre}")
    print(group_data)
    print()

Genre: Action
          filmtv_id           movie_title  release_year   genre  duration  \
movie_id                                                                    
15               38          Mr. Majestyk          1974  Action       105   
109             193           Airport '77          1977  Action       110   
342             530   Wings of the Apache          1990  Action        86   
369             574  Geheimcode Wildgänse          1985  Action        93   
...             ...                   ...           ...     ...       ...   
41308        225561    Tin joek yau ching          1990  Action        87   
41314        226116          Blood & Gold          2023  Action       100   
41337        228408          Kung Fu Girl          2021  Action        98   
41343        229097               Hit Man          2023  Action       113   
41378        231061            Sentinelle          2023  Action        99   

                country                         directors  \


In [117]:
year_genre_groups = df.groupby(['release_year', 'genre'])  # Groups by year and genre

In [118]:
year_genre_groups

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000218A94A78C0>

# Aggregation

After grouping, you might want to perform aggregation operations like sum, mean, count, etc., to summarize the data.

In [120]:
# Simple Aggregation - Calculating the Average
avg_duration_by_genre = df.groupby('genre')['duration'].mean()  # Average duration per genre
print(avg_duration_by_genre)

genre
Action             101.682599
Adventure          102.870079
Animation           88.925651
Biblical           137.210526
Biography          120.602920
Comedy              97.881409
Crime               98.939914
Documentary         92.641638
Drama              105.271266
Erotico             90.893617
Fantasy            105.612326
Gangster           117.803571
Grotesque           99.851852
History            127.159420
Horror              92.603867
Musical            106.240654
Mythology           97.969697
Mélo               107.836538
Noir                98.428571
Romantic            94.778571
Sci-Fi             102.097238
Short Movie        210.500000
Sperimental         97.716049
Sport               95.428571
Spy                105.333333
Stand-up Comedy     73.000000
Super-hero         121.855422
Thriller            97.957849
War                111.306173
Western             95.800175
Name: duration, dtype: float64


In [121]:
# Multiple Aggregations on a Single Column
stats_by_genre = df.groupby('genre')['avg_vote'].agg([np.mean, np.std, np.min, np.max])
print(stats_by_genre)


                     mean       std  min   max
genre                                         
Action           5.123898  1.399433  1.0   9.0
Adventure        5.545866  1.324040  1.4   9.3
Animation        6.118030  1.281681  1.6   9.3
Biblical         5.184211  1.273753  2.9   8.2
Biography        5.949051  1.208300  1.0  10.0
Comedy           5.482645  1.355306  1.0  10.0
Crime            6.142918  1.167942  2.7   8.8
Documentary      6.564749  1.180058  1.0  10.0
Drama            6.292677  1.302034  1.0  10.0
Erotico          4.052766  1.323337  1.0   7.9
Fantasy          5.486282  1.322481  1.3   9.4
Gangster         6.614286  1.383333  2.5   9.0
Grotesque        6.458848  1.250535  2.3   9.1
History          5.887681  1.245398  2.7   8.8
Horror           5.225874  1.301247  1.3   9.1
Musical          6.016121  1.474926  1.8   9.0
Mythology        4.631818  1.298183  2.2   8.0
Mélo             6.715385  1.256731  3.6   8.6
Noir             6.985294  1.104915  3.4   9.5
Romantic     

  stats_by_genre = df.groupby('genre')['avg_vote'].agg([np.mean, np.std, np.min, np.max])
  stats_by_genre = df.groupby('genre')['avg_vote'].agg([np.mean, np.std, np.min, np.max])
  stats_by_genre = df.groupby('genre')['avg_vote'].agg([np.mean, np.std, np.min, np.max])
  stats_by_genre = df.groupby('genre')['avg_vote'].agg([np.mean, np.std, np.min, np.max])


In [122]:
# Different Aggregations for Different Columns
complex_aggregation = df.groupby('genre').agg({
    'duration': np.mean,  # average duration
    'avg_vote': [np.min, np.max],  # min and max average votes
    'public_vote': 'sum'  # total of public votes
})

  complex_aggregation = df.groupby('genre').agg({
  complex_aggregation = df.groupby('genre').agg({
  complex_aggregation = df.groupby('genre').agg({


In [124]:
complex_aggregation

Unnamed: 0_level_0,duration,avg_vote,avg_vote,public_vote
Unnamed: 0_level_1,mean,min,max,sum
genre,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Action,101.682599,1.0,9.0,11369.0
Adventure,102.870079,1.4,9.3,8747.0
Animation,88.925651,1.6,9.3,6639.0
Biblical,137.210526,2.9,8.2,202.0
Biography,120.60292,1.0,10.0,4136.0
Comedy,97.881409,1.0,10.0,51634.0
Crime,98.939914,2.7,8.8,2953.0
Documentary,92.641638,1.0,10.0,13371.0
Drama,105.271266,1.0,10.0,76234.0
Erotico,90.893617,1.0,7.9,981.0


**Aggregating Without Grouping:**
Sometimes, you may want to perform aggregations without the need to group the data.

In [125]:
# Overall Summary Statistics
overall_stats = df[['duration', 'avg_vote']].describe()
overall_stats

Unnamed: 0,duration,avg_vote
count,41399.0,41399.0
mean,100.537163,5.801522
std,27.260962,1.403861
min,41.0,1.0
25%,90.0,4.8
50%,96.0,5.9
75%,107.0,6.9
max,1525.0,10.0


**df.count():** This method returns the number of non-null values in each DataFrame column. It can be used to count the number of non-null values in each column individually.

In [126]:
# Create a DataFrame
data = {'A': [1, 2, None], 'B': [4, None, 6], 'C': [5, 8, 9]}
data_df = pd.DataFrame(data)

# Count non-null values in each column
counts = data_df.count()
print(counts)

A    2
B    2
C    3
dtype: int64


**df.value_counts():** This method returns the frequency counts of unique values in a Series. It is typically used on a single column of the DataFrame and is useful for analyzing the distribution of values within that column.

In [127]:
# Value Counts
df['movie_title'].value_counts()

movie_title
Les Vampires                     10
Pinocchio                         8
Riget II                          6
The Hound of the Baskervilles     5
Little Women                      5
                                 ..
Gold Digger Killer                1
Addio al nubilato 2               1
Konferensen                       1
Ballelina                         1
Diner                             1
Name: count, Length: 39531, dtype: int64

**Custom Aggregation Functions:**
Pandas allows you to define and use custom aggregation functions for more specific data analysis needs.

In [128]:
# Using a Custom Function for Aggregation
def range_func(series):
    return series.max() - series.min()

range_by_genre = df.groupby('genre')['duration'].agg(range_func)  # Range of durations by genre
range_by_genre

genre
Action              197
Adventure           355
Animation          1129
Biblical            330
Biography           267
Comedy              558
Crime               135
Documentary         859
Drama              1484
Erotico              61
Fantasy             202
Gangster            245
Grotesque           236
History             560
Horror              213
Musical             247
Mythology           104
Mélo                151
Noir                110
Romantic            150
Sci-Fi              259
Short Movie         487
Sperimental         774
Sport                34
Spy                 115
Stand-up Comedy      27
Super-hero          101
Thriller            550
War                 655
Western             145
Name: duration, dtype: int64

**Renaming Grouped Aggregation Results:**
It is often useful to rename the results of aggregations for clarity or further analysis.

In [129]:
# Renaming Aggregation Results
renamed_aggregations = df.groupby('genre')['avg_vote'].agg([
    ('Average Rating', 'mean'),  # Renames the mean result to 'Average Rating'
    ('Rating Standard Deviation', 'std')  # Renames the std result to 'Rating Standard Deviation'
])
renamed_aggregations

Unnamed: 0_level_0,Average Rating,Rating Standard Deviation
genre,Unnamed: 1_level_1,Unnamed: 2_level_1
Action,5.123898,1.399433
Adventure,5.545866,1.32404
Animation,6.11803,1.281681
Biblical,5.184211,1.273753
Biography,5.949051,1.2083
Comedy,5.482645,1.355306
Crime,6.142918,1.167942
Documentary,6.564749,1.180058
Drama,6.292677,1.302034
Erotico,4.052766,1.323337
