![image.png](attachment:image.png)

![image.png](attachment:image.png)

## Things we can do in Pandas

1. **Data Loading** - Read data from various sources such as `csv`, `excel`, `sql databases`
2. **Data Inspection** - Quickly inspect and understand the data structure and content of your data using functions `head()`, `tail()`, `describe()`, `info()`
3. **Data Cleaning / Processing** - Identifying missing records in the data and fixing it, Handling duplicates, transforming data to the desired format
4. **Data Selection and Filtering** - Functions like `.loc()` and `.iloc()` will be slicing and filtering the data
5. **Data Manipulation** - Perform various data manipulation tasks such as sorting, grouping, aggregating, merging and pivoting using functions like `groupby()`, `agg()` , `merge()`, and `pivot_table()`
6. **Data Visualization** - Create visualization in Pandas

- pandas is package / library in Python and the most important tool for data analysts / data scientists
- Powerful ML and visualization tools work on the back of pandas
    - **pandas is the backbone of most data projects**

## Pandas Data Structure

#### `Series` & `DataFrame`

- A series is essentially a `column`, and a DataFrame is a multi-dimensional table made up of collection of series

![image.png](attachment:image.png)

In [1]:
import numpy as np
import os
import pandas as pd

### Create Series

#### 1. From `ndarray`

In [2]:
arr1 = np.random.randint(10,100,10)

In [3]:
arr1

array([83, 98, 13, 80, 65, 43, 18, 61, 60, 98])

#### `pd.Series()` - creates a Series from array, list or other Python objects

In [4]:
pds_from_array = pd.Series(arr1) #converts array to series

In [5]:
pds_from_array

0    83
1    98
2    13
3    80
4    65
5    43
6    18
7    61
8    60
9    98
dtype: int32

In [6]:
print(type(pds_from_array))

<class 'pandas.core.series.Series'>


**`S` is a capital one in pd.Series**

![image.png](attachment:image.png)

In [7]:
pds_apples = pd.Series([3,2,0,1],index=['Akash', 'Antony', 'Abrar', 'Prabha'], name='Apples' )
pds_oranges = pd.Series([0,3,7,2], index=['Akash', 'Antony', 'Abrar', 'Prabha'], name='Oranges')

In [8]:
pds_apples

Akash     3
Antony    2
Abrar     0
Prabha    1
Name: Apples, dtype: int64

In [9]:
pds_oranges

Akash     0
Antony    3
Abrar     7
Prabha    2
Name: Oranges, dtype: int64

### Extract the index and values from a Series

In [10]:
pds_apples

Akash     3
Antony    2
Abrar     0
Prabha    1
Name: Apples, dtype: int64

In [11]:
pds_apples.index

Index(['Akash', 'Antony', 'Abrar', 'Prabha'], dtype='object')

In [12]:
pds_apples.values

array([3, 2, 0, 1], dtype=int64)

- Same steps can be done for `pds_oranges`

### From Dictionary

In [13]:
python_score ={'Akash':19, 'Monica': 18, 'Gaurav':5, 'Shivansh':17, 'Sangya':20, 'Ajay':9}

In [14]:
python_score

{'Akash': 19,
 'Monica': 18,
 'Gaurav': 5,
 'Shivansh': 17,
 'Sangya': 20,
 'Ajay': 9}

In [15]:
python_score.keys()

dict_keys(['Akash', 'Monica', 'Gaurav', 'Shivansh', 'Sangya', 'Ajay'])

In [16]:
python_score.values()

dict_values([19, 18, 5, 17, 20, 9])

In [17]:
python_score.items()

dict_items([('Akash', 19), ('Monica', 18), ('Gaurav', 5), ('Shivansh', 17), ('Sangya', 20), ('Ajay', 9)])

#### Convert dictionary to a Series

In [18]:
pds_python_score = pd.Series(python_score)

In [19]:
pds_python_score

Akash       19
Monica      18
Gaurav       5
Shivansh    17
Sangya      20
Ajay         9
dtype: int64

In [20]:
pds_python_score = pd.Series(python_score, index=['Akash', 'gaurav', 'Indrani', 'Prabha', 'Umar','Monica'])

In [21]:
pds_python_score

Akash      19.0
gaurav      NaN
Indrani     NaN
Prabha      NaN
Umar        NaN
Monica     18.0
dtype: float64

`NaN`: Not a number -missing data --> empty values / null values / - 

#### Observation: Use the dictionary itself to get the indices while creating a Series

### Q. Let us try to find all students who scored more than 10

In [22]:
pds_python_score

Akash      19.0
gaurav      NaN
Indrani     NaN
Prabha      NaN
Umar        NaN
Monica     18.0
dtype: float64

#### Criteria for filtering

In [23]:
filt = pds_python_score>10

In [24]:
filt

Akash       True
gaurav     False
Indrani    False
Prabha     False
Umar       False
Monica      True
dtype: bool

### Masking the data

In [25]:
pds_python_score[filt] #masking concept to filtering

Akash     19.0
Monica    18.0
dtype: float64

### Basic Descriptive Statistics

In [26]:
print('Avg. Python Score:', pds_python_score.mean())
print('Min. Python Score:', pds_python_score.min())
print('Max. Python Score:', pds_python_score.max())
print('Median Python Score:', pds_python_score.median())
print('Variance of Python Score:', pds_python_score.var())
print('Std. Deviation Score:', pds_python_score.std())

Avg. Python Score: 18.5
Min. Python Score: 18.0
Max. Python Score: 19.0
Median Python Score: 18.5
Variance of Python Score: 0.5
Std. Deviation Score: 0.7071067811865476


In [27]:
pds_python_score.median

<bound method NDFrame._add_numeric_operations.<locals>.median of Akash      19.0
gaurav      NaN
Indrani     NaN
Prabha      NaN
Umar        NaN
Monica     18.0
dtype: float64>

### Series Object Attributes

    - Series.index: Defines the index of the Series.
    - Series.shape: It returns a tuple of shape of the data.
    - Series.dtype: It returns the data type of the data.
    - Series.size: It returns the size of the data.
    - Series.empty: It returns True if Series object is empty, otherwise returns false.
    - Series.hasnans:It returns True if there are any NaN values, otherwise returns false.
    - Series.nbytes: It returns the number of bytes in the data.
    - Series.ndim: It returns the number of dimensions in the data.

#### `index`

In [28]:
pds_python_score

Akash      19.0
gaurav      NaN
Indrani     NaN
Prabha      NaN
Umar        NaN
Monica     18.0
dtype: float64

In [29]:
pds_python_score.index

Index(['Akash', 'gaurav', 'Indrani', 'Prabha', 'Umar', 'Monica'], dtype='object')

#### `shape`

In [30]:
pds_python_score.shape

(6,)

#### `dtype`

In [31]:
pds_python_score.dtype

dtype('float64')

#### `size`

In [32]:
pds_python_score.size

6

#### `empty`

In [33]:
pds_python_score.empty

False

#### `hasnans`

In [34]:
pds_python_score.hasnans

True

#### `nbytes`

In [35]:
pds_python_score.nbytes

48

#### `ndim`

In [36]:
pds_python_score.ndim

1

## How to combine multiple Series into a DataFrame ?

In [38]:
pds_apples

Akash     3
Antony    2
Abrar     0
Prabha    1
Name: Apples, dtype: int64

In [39]:
pds_oranges

Akash     0
Antony    3
Abrar     7
Prabha    2
Name: Oranges, dtype: int64

### `pd.DataFrame()`

### Create a DataFrame from the two series

In [40]:
df_fruits_sales = pd.DataFrame({'Count of Apples': pds_apples, 'Count of Oranges': pds_oranges})

In [41]:
df_fruits_sales

Unnamed: 0,Count of Apples,Count of Oranges
Akash,3,0
Antony,2,3
Abrar,0,7
Prabha,1,2


In [42]:
print(type(df_fruits_sales))

<class 'pandas.core.frame.DataFrame'>


**Observe capital `D` and `F` in DataFrame**

`pd.DataFrame() function combines the two Series into a DataFrame` aligning the values based on their **indices**

## Loading the Dataset

### Creating a Pandas DataFrame from a file --> `flat files`

#### What is a flat file ?

A flat file refers to a type of file that contains tabular data without any structured relationships. It is a plain tet file where each line represents a record, and the fields within each record are typically separated by **delimiters such as commas, tabs, or spaces**

![image.png](attachment:image.png)

![image.png](attachment:image.png)

#### What is a csv file?

- A comma-separated values (CSV) file is a `delimited` text file that uses a **comma** to separate values.
- Each line of the file is a data record.
- Each record consists of one or more columns/fields, separated by commas
- Tabular data (numbers/text/date)

### Get the current working directory

In [43]:
import os

In [44]:
os.getcwd() # get current working directory

'C:\\Users\\think\\OneDrive - Thinking Mojo\\TSLC\\Intellipaat\\Session Master\\06.Data Science Weekday Batch - 11Oct'

### To see the list of files in the working directory

In [49]:
os.listdir() #lists all the files in the working directory

['.ipynb_checkpoints',
 'Flow_Control_Statements_20_Oct_APC.pdf',
 'heart.csv',
 'IMDB-Movie-Data.csv',
 'Introduction_to_Course_APC.pdf',
 'Intro_to_Data_Manipulation_31Oct_APC.pdf',
 'Intro_to_Data_Manipulation_using_NumPy_01Nov_APC.pdf',
 'Intro_to_Pandas_10Nov_APC.pdf',
 'Intro_to_Python_11Oct_APC.pdf',
 'Iris.csv',
 'M01-Basic Python-Session-3-13Oct-APC.ipynb',
 'M01-Basic Python-Session-4-17Oct-APC.ipynb',
 'M01-Basic Python-Session-4-18Oct-APC.ipynb',
 'M01-Basic Python-Session-5-19Oct-APC.ipynb',
 'M01-Basic Python-Session-6-20Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-10-27Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-6-20Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-7-8-24-25Oct-APC.ipynb',
 'M01-Flow Control Statements-Session-9-26Oct-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-12-01Nov-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-13-02Nov-APC.ipynb',
 'M02-Data_Manipulation_NumPy-Session-14-03Nov-APC.ipynb',
 'M02-Data_Manipulation_NumP

### Reading the `IMDB` data using `pd.read_csv()`

In [46]:
df_imdb  = pd.read_csv("IMDB-Movie-Data.csv")

In [48]:
df_imdb.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


### `head()` and `tail()`

`head()`: it shows the top **5** rows of your data

In [50]:
df_imdb.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


#### Pointers about IMDB data

- IMDb: Internet Movie Database
- Top Highly Rated 1000 movies 

Here's a data set of 1,000 most popular movies on IMDB in the last 10 years (2006-2016). The data fields included are:

Title, Genre, Description, Directorm, Actors, Year, Runtime, Rating, Votes, Revenue, Metascrore

In [54]:
df_imdb.head(2)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0


In [56]:
df_imdb.head(2).T #transpose

Unnamed: 0,0,1
Rank,1,2
Title,Guardians of the Galaxy,Prometheus
Genre,"Action,Adventure,Sci-Fi","Adventure,Mystery,Sci-Fi"
Description,A group of intergalactic criminals are forced ...,"Following clues to the origin of mankind, a te..."
Director,James Gunn,Ridley Scott
Actors,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...","Noomi Rapace, Logan Marshall-Green, Michael Fa..."
Year,2014,2012
Runtime (Minutes),121,124
Rating,8.1,7.0
Votes,757074,485820


`tail()`: it shows the bottom **5** rows of your data

In [57]:
df_imdb.tail()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
995,996,Secret in Their Eyes,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585,,45.0
996,997,Hostel: Part II,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.54,46.0
997,998,Step Up 2: The Streets,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,50.0
998,999,Search Party,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,22.0
999,1000,Nine Lives,"Comedy,Family,Fantasy",A stuffy businessman finds himself trapped ins...,Barry Sonnenfeld,"Kevin Spacey, Jennifer Garner, Robbie Amell,Ch...",2016,87,5.3,12435,19.64,11.0


`sample()`: it shows the random rows of your data

In [59]:
df_imdb.sample(5)

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
681,682,We Need to Talk About Kevin,"Drama,Mystery,Thriller",Kevin's mother struggles to love her strange c...,Lynne Ramsay,"Tilda Swinton, John C. Reilly, Ezra Miller, Ja...",2011,112,7.5,104953,1.74,68.0
966,967,L'odyssée,"Adventure,Biography",Highly influential and a fearlessly ambitious ...,Jérôme Salle,"Lambert Wilson, Pierre Niney, Audrey Tautou,La...",2016,122,6.7,1810,,70.0
656,657,Boyhood,Drama,"The life of Mason, from early childhood to his...",Richard Linklater,"Ellar Coltrane, Patricia Arquette, Ethan Hawke...",2014,165,7.9,286722,25.36,100.0
224,225,We're the Millers,"Comedy,Crime",A veteran pot dealer creates a fake family as ...,Rawson Marshall Thurber,"Jason Sudeikis, Jennifer Aniston, Emma Roberts...",2013,110,7.0,334867,150.37,44.0
726,727,Friends with Benefits,"Comedy,Romance",A young man and woman decide to take their fri...,Will Gluck,"Mila Kunis, Justin Timberlake, Patricia Clarks...",2011,109,6.6,286543,55.8,63.0


In [60]:
df_imdb.sample(5).T

Unnamed: 0,561,433,232,749,753
Rank,562,434,233,750,754
Title,Evil Dead,Mission: Impossible - Ghost Protocol,Apocalypto,Percy Jackson: Sea of Monsters,This Means War
Genre,"Fantasy,Horror","Action,Adventure,Thriller","Action,Adventure,Drama","Adventure,Family,Fantasy","Action,Comedy,Romance"
Description,"Five friends head to a remote cabin, where the...",The IMF is shut down when it's implicated in t...,"As the Mayan kingdom faces its decline, the ru...","In order to restore their dying safe haven, th...",Two top CIA operatives wage an epic battle aga...
Director,Fede Alvarez,Brad Bird,Mel Gibson,Thor Freudenthal,McG
Actors,"Jane Levy, Shiloh Fernandez, Jessica Lucas, Lo...","Tom Cruise, Jeremy Renner, Simon Pegg, Paula P...","Gerardo Taracena, Raoul Max Trujillo, Dalia He...","Logan Lerman, Alexandra Daddario, Brandon T. J...","Reese Witherspoon, Chris Pine, Tom Hardy, Til ..."
Year,2013,2011,2006,2013,2012
Runtime (Minutes),91,132,139,106,103
Rating,6.5,7.4,7.8,5.9,6.3
Votes,133113,382459,247926,91684,154400


### Getting info about your data

`df.info()`: this is your first command after you load your data

In [61]:
df_imdb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  872 non-null    float64
 11  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB


![image.png](attachment:image.png)

### Missing Values Identification and Treatment

### Step #1: Identify the missing values

#### `df.isna()`, `df.isnull()`

In [63]:
df_imdb.isna() #returns a boolean flag: null - True and non-null- False

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...
995,False,False,False,False,False,False,False,False,False,False,True,False
996,False,False,False,False,False,False,False,False,False,False,False,False
997,False,False,False,False,False,False,False,False,False,False,False,False
998,False,False,False,False,False,False,False,False,False,False,True,False


`True`: `1` => Null

`False`: `0` => Non-Null

### Quantifying the nulls

In [64]:
df_imdb.isna().sum()

Rank                    0
Title                   0
Genre                   0
Description             0
Director                0
Actors                  0
Year                    0
Runtime (Minutes)       0
Rating                  0
Votes                   0
Revenue (Millions)    128
Metascore              64
dtype: int64

In [65]:
df_imdb.isnull().sum()

Rank                    0
Title                   0
Genre                   0
Description             0
Director                0
Actors                  0
Year                    0
Runtime (Minutes)       0
Rating                  0
Votes                   0
Revenue (Millions)    128
Metascore              64
dtype: int64

### In terms of percentage

In [67]:
df_imdb.isna().sum()/len(df_imdb)*100

Rank                   0.0
Title                  0.0
Genre                  0.0
Description            0.0
Director               0.0
Actors                 0.0
Year                   0.0
Runtime (Minutes)      0.0
Rating                 0.0
Votes                  0.0
Revenue (Millions)    12.8
Metascore              6.4
dtype: float64

### Step 2: Treat the missing values

In [68]:
df_imdb.shape

(1000, 12)

![image.png](attachment:image.png)

There are general options in dealing with nulls:

    1. Columns / Rows are of least importance: Get rid of rows or columns with nulls
    
    2. Imputation: Replace nulls with non-null values (Mean, Median, Mode Imputation)
    
    3. ML based Imputation: should be discussed in ML
    
    ---- many more

#### Decision Point
`Which type of treatment should be done and why?`

**Decision Point: For IMDB dataset, get rid of rows with null**

#### `dropna()`

In [70]:
df_imdb.shape

(1000, 12)

In [71]:
df_imdb.dropna(inplace=True) #operation will remove the rows with even a single null in any one of the columns

In [73]:
df_imdb.shape

(838, 12)

`inplace = True` => it makes the changes in the original dataframe

In [74]:
df_imdb.isna().sum()

Rank                  0
Title                 0
Genre                 0
Description           0
Director              0
Actors                0
Year                  0
Runtime (Minutes)     0
Rating                0
Votes                 0
Revenue (Millions)    0
Metascore             0
dtype: int64

### H/W Prepare a summary on the missing rows
1. How many movies did we drop 
2. What's the before and after average rating breakup by set
3. No. of votes

### H/W Subset the data from IMDB with missing records

`Get the 162 rows of missing data and prepare a summary`

In [75]:
df_imdb_1 = pd.read_csv("IMDB-Movie-Data.csv")

In [76]:
df_imdb_1.isna().sum()

Rank                    0
Title                   0
Genre                   0
Description             0
Director                0
Actors                  0
Year                    0
Runtime (Minutes)       0
Rating                  0
Votes                   0
Revenue (Millions)    128
Metascore              64
dtype: int64

### Let us describe the statistics in the data

**`df.describe()`** - Generates high level descriptive statistics 

In [77]:
df_imdb.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,838.0,838.0,838.0,838.0,838.0,838.0,838.0
mean,485.247017,2012.50716,114.638425,6.81432,193230.3,84.564558,59.575179
std,286.572065,3.17236,18.470922,0.877754,193099.0,104.520227,16.952416
min,1.0,2006.0,66.0,1.9,178.0,0.0,11.0
25%,238.25,2010.0,101.0,6.3,61276.5,13.9675,47.0
50%,475.5,2013.0,112.0,6.9,136879.5,48.15,60.0
75%,729.75,2015.0,124.0,7.5,271083.0,116.8,72.0
max,1000.0,2016.0,187.0,9.0,1791916.0,936.63,100.0


### Show the list of all columns

#### `df.columns`

In [79]:
df_imdb.columns

Index(['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore'],
      dtype='object')

## DataFrame Slicing, Selecting & Extracting

In [80]:
arr_4d = np.array([
    [[10,11,12], [13,14,15], [16,17,18]], #first layer
    [[20,21,22], [23,24,25], [26,27,28]], 
    [[30,31,32] ,[33,34,35], [36,37,38]]
])       


In [81]:
arr_4d.ndim

3

In [82]:
arr_4d

array([[[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]],

       [[20, 21, 22],
        [23, 24, 25],
        [26, 27, 28]],

       [[30, 31, 32],
        [33, 34, 35],
        [36, 37, 38]]])