### What factors contribute to the widespread adoption and popularity of Pandas as a data analysis library?
 Pandas is a data analysis library built on top of the Python programming language.

 Pandas excels at performing complex operations on large data sets with a terse
syntax.

 Competitors to pandas include the graphical spreadsheet application Excel, the
statistical programming language R, and the SAS software suite.

 Programming requires a different skill set than working with Excel or Sheets.

 Pandas can import a variety of file formats. A popular format is CSV, which separates rows with line breaks and row values with commas.

 The DataFrame is the primary data structure in pandas. It is effectively a table
of data with multiple columns.

 The Series is a one-dimensional labeled array. Think of it as a single column
of data.

 We can access a row in a Series or DataFrame by its row number or index
label.

 We can sort a DataFrame by values across one or more columns.

 We can use logical conditions to extract subsets of data from a DataFrame.

 We bucket DataFrame rows based on a column’s values. We can also perform
aggregate operations such as sums on the resulting groups

In [1]:
#Importing Pandas library
import pandas as pd

### 1.1 Data Loading 

In [2]:
#Loading a CSV Data File
data = pd.read_csv("MoviesData.csv")

In [3]:
#Extracting first 5 rows of the data
data.head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0
2,Avatar,Fox,$2,789.7,2009.0
3,Titanic,Paramount,$2,187.5,1997.0
4,Star Wars The Force Awakens,Buena Vista,$2,68.2,2015.0
5,Avengers Infinity War,Buena Vista,$2,48.4,2018.0


In [4]:
#Extracting last 5 rows of the data
data.tail()

Unnamed: 0,Rank,Title,Studio,Gross,Year
778,Yogi Bear,Warner Brothers,$201.60,2010,
779,Garfield The Movie,Fox,$200.80,2004,
780,Cats & Dogs,Warner Brothers,$200.70,2001,
781,The Hunt for Red October,Paramount,$200.50,1990,
782,Valkyrie,MGM,$200.30,2008,


In [5]:
#Check the total number of rows and columns
data.shape

(782, 5)

In [6]:
# Total Data points in the entire Datafile i.e number of rows* number of cols
data.size

3910

In [7]:
# Check for Datatypes [Object indicates str values and float indicates to decimal values ]
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 782 entries, 1 to 782
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Rank    782 non-null    object 
 1   Title   782 non-null    object 
 2   Studio  782 non-null    object 
 3   Gross   782 non-null    object 
 4   Year    45 non-null     float64
dtypes: float64(1), object(4)
memory usage: 36.7+ KB


### 1.2 Accessing records in a DataFrame

In [8]:
data.head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0
2,Avatar,Fox,$2,789.7,2009.0
3,Titanic,Paramount,$2,187.5,1997.0
4,Star Wars The Force Awakens,Buena Vista,$2,68.2,2015.0
5,Avengers Infinity War,Buena Vista,$2,48.4,2018.0


In [9]:
# 1. Using loc
data.loc[1]

Rank      Avengers Endgame
Title          Buena Vista
Studio                  $2
Gross              796.30 
Year                  2019
Name: 1, dtype: object

In [10]:
# 2. using iloc; values before comma(,) indicate rows index and after comma(,) indicates column index
data.iloc[0:1,:]

Unnamed: 0,Rank,Title,Studio,Gross,Year
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0


### 1.3 Sorting Values in DataFrame

In [11]:
#Sorting values by 1 column in Ascending order
data.sort_values(by='Year',ascending=True).head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
679,Honey,I Shrunk the Kids,Buena Vista,$222.70,1989.0
33,Jurassic Park,Universal,$1,029.50,1993.0
3,Titanic,Paramount,$2,187.50,1997.0
35,Star Wars Episode I - The Phantom Menace,Fox,$1,027.00,1999.0
724,Crouching Tiger,Hidden Dragon,Sony,$213.50,2000.0


In [12]:
#Sorting values by 1 column in Descending order
data.sort_values(by='Year',ascending=False).head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0
22,Captain Marvel,Buena Vista,$1,128.3,2019.0
5,Avengers Infinity War,Buena Vista,$2,48.4,2018.0
21,Aquaman,Warner Brothers,$1,148.0,2018.0
16,Incredibles 2,Buena Vista,$1,242.8,2018.0


In [13]:
#Sorting values by 2 column in Ascending order
data.sort_values(by=['Year','Gross'],ascending=[True,True]).head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
679,Honey,I Shrunk the Kids,Buena Vista,$222.70,1989.0
33,Jurassic Park,Universal,$1,029.50,1993.0
3,Titanic,Paramount,$2,187.50,1997.0
35,Star Wars Episode I - The Phantom Menace,Fox,$1,027.00,1999.0
724,Crouching Tiger,Hidden Dragon,Sony,$213.50,2000.0


In [14]:
#Sorting values by 2 column with different order(ascending, descending) 
# Sorting data by year in descending order but Gross in ascending order
data.sort_values(by=['Year','Gross'],ascending=[False,True]).head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
22,Captain Marvel,Buena Vista,$1,128.3,2019.0
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0
5,Avengers Infinity War,Buena Vista,$2,48.4,2018.0
21,Aquaman,Warner Brothers,$1,148.0,2018.0
16,Incredibles 2,Buena Vista,$1,242.8,2018.0


In [15]:
#Sorting on the basis of indexes 
data.sort_index().head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0
2,Avatar,Fox,$2,789.7,2009.0
3,Titanic,Paramount,$2,187.5,1997.0
4,Star Wars The Force Awakens,Buena Vista,$2,68.2,2015.0
5,Avengers Infinity War,Buena Vista,$2,48.4,2018.0


### 1.4 Counting Values in a Series && Finding Unique Values in a Series

In [17]:
data['Title'].value_counts()

Warner Brothers                131
Buena Vista                    122
Fox                            116
Universal                      109
Sony                            85
Paramount                       76
Dreamworks                      27
Lionsgate                       21
New Line                        16
MGM                             11
TriStar                         11
Miramax                         10
Weinstein                        6
Columbia                         5
WGUSA                            4
Orion                            2
SonR                             2
Dimension                        2
Polygram                         2
 Inc.                            1
Focus                            1
HC                               1
USA                              1
Artisan                          1
RKO                              1
Newmarket                        1
 Robot                           1
Lions                            1
CL                  

In [18]:
data['Rank'].unique()

array(['Avengers Endgame', 'Avatar', 'Titanic',
       'Star Wars The Force Awakens', 'Avengers Infinity War',
       'Jurassic World', "Marvel's The Avengers", 'Furious 7',
       'Avengers Age of Ultron', 'Black Panther',
       'Harry Potter and the Deathly Hallows Part 2',
       'Star Wars The Last Jedi', 'Jurassic World Fallen Kingdom',
       'Frozen', 'Beauty and the Beast', 'Incredibles 2',
       'The Fate of the Furious', 'Iron Man 3', 'Minions',
       'Captain America Civil War', 'Aquaman', 'Captain Marvel',
       'Transformers Dark of the Moon',
       'The Lord of the Rings The Return of the King', 'Skyfall',
       'Transformers Age of Extinction', 'The Dark Knight Rises',
       'Toy Story 3', "Pirates of the Caribbean Dead Man's Chest",
       'Rogue One A Star Wars Story',
       'Pirates of the Caribbean On Stranger Tides', 'Despicable Me 3',
       'Jurassic Park', 'Finding Dory',
       'Star Wars Episode I - The Phantom Menace', 'Alice in Wonderland',
       'Zo

In [19]:
# Finding count of unique values in a series
data['Rank'].nunique()

773

### 1.5 Filtering Values in DataFrame

In [22]:
data.head()

Unnamed: 0,Rank,Title,Studio,Gross,Year
1,Avengers Endgame,Buena Vista,$2,796.3,2019.0
2,Avatar,Fox,$2,789.7,2009.0
3,Titanic,Paramount,$2,187.5,1997.0
4,Star Wars The Force Awakens,Buena Vista,$2,68.2,2015.0
5,Avengers Infinity War,Buena Vista,$2,48.4,2018.0


In [21]:
# Filtering only on 1 column
data[data['Title']=='Paramount']

Unnamed: 0,Rank,Title,Studio,Gross,Year
3,Titanic,Paramount,$2,187.50,1997.0
23,Transformers Dark of the Moon,Paramount,$1,123.80,2011.0
26,Transformers Age of Extinction,Paramount,$1,104.10,2014.0
73,Transformers Revenge of the Fallen,Paramount,$836.30,2009,
85,Mission Impossible - Fallout,Paramount,$791.10,2018,
...,...,...,...,...,...
752,Paranormal Activity 3,Paramount,$207.00,2011,
754,Sleepy Hollow,Paramount,$206.10,1999,
767,Vanilla Sky,Paramount,$203.40,2001,
768,Arrival,Paramount,$203.40,2016,


In [25]:
# Filtering on basis of 2 columns
data[((data['Title']=='Paramount')&(data['Studio']<='$10'))]

Unnamed: 0,Rank,Title,Studio,Gross,Year
23,Transformers Dark of the Moon,Paramount,$1,123.8,2011.0
26,Transformers Age of Extinction,Paramount,$1,104.1,2014.0


In [30]:
# Filtering on text data
data[data['Rank'].str.lower().str.contains('dark')]

Unnamed: 0,Rank,Title,Studio,Gross,Year
23,Transformers Dark of the Moon,Paramount,$1,123.8,2011.0
27,The Dark Knight Rises,Warner Brothers,$1,84.9,2012.0
39,The Dark Knight,Warner Brothers,$1,4.9,2008.0
132,Thor The Dark World,Buena Vista,$644.60,2013.0,
232,Star Trek Into Darkness,Paramount,$467.40,2013.0,
309,Fifty Shades Darker,Universal,$381.50,2017.0,
600,Dark Shadows,Warner Brothers,$245.50,2012.0,
603,Dark Phoenix,Fox,$245.10,2019.0,


### 1.6 Grouping Data

In [33]:
# Grouping on the basis of 1 column
data.groupby('Title').count()

Unnamed: 0_level_0,Rank,Studio,Gross,Year
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Hidden Dragon,1,1,1,1
I Shrunk the Kids,1,1,1,1
Inc.,1,1,1,1
Robot,1,1,1,1
the Witch and the Wardrobe,1,1,1,1
000 B.C.,1,1,1,1
Artisan,1,1,1,0
Buena Vista,122,122,122,20
CL,1,1,1,0
China Film Corporation,1,1,1,0


In [36]:
# Grouping and same time renaming the aggregated Column value
data.groupby('Title').agg(movie_count=('Rank','count'), Latest_year=('Year','max'))

Unnamed: 0_level_0,movie_count,Latest_year
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Hidden Dragon,1,2000.0
I Shrunk the Kids,1,1989.0
Inc.,1,2001.0
Robot,1,2004.0
the Witch and the Wardrobe,1,2005.0
000 B.C.,1,2008.0
Artisan,1,
Buena Vista,122,2019.0
CL,1,
China Film Corporation,1,
