# Brief Introduction to Pandas and Data Sets

<font color=yellow>

* What is pandas?
* How does one access data using Python?
* Explore some basic features of pandas for accessing data.
* Answer the question, related to IMDB, "What it comes to rating movies, how different are critics and fans?"

### Import Packages
* pandas is the Python Data Analysis Library (https://pandas.pydata.org/about/index.html)
* os is a package for operating system functionality
* The syntax is <font color = green>import</font> package <font color = green>as</font> object
* When using import *** as *** you may name the object whatever you like, however, there are some community norms to which programmers adhere. If you plan on posting code to GitHub for others to see your project or use your code, it would beneficial to those readers if you followed those norms.
* Conventional names for common package objects are: numpy = np, pandas = pd, matplotlib.pyplot = plt
* You will notice below, we did not name the operating system object - it's only two letters, so we didn't bother

###### <font color = yellow> Jupyter notebooks are made up of cells, kind of like people but not at all like people. Anyhow, cells are executed using the keys SHIFT-ENTER simultaneously. You can click the play button on the menu at the top but no one, I mean no one serious, does that. Just make sure your cursor is in the cell you want to execute.

In [None]:
import pandas as pd
import os

### Section: Load the Dataframe

###### Use listdir to see what items are in the current folder

In [None]:
os.listdir() # Check, what is in the folder?

###### Read the data, in this case, from a comma separated values (csv) file into an object called a dataframe. You may name the dataframe whatever you like. The most vanilla conventional name is <font color = yellow>df<font>.

In [None]:
df = pd.read_csv('imdb_movie_data_2023.csv') # Read the csv file

###### The df object has several methods, one is named head, which may be used to peek at the first 5 rows of the dataframe. It is typically used to gather some intelligence about the data's structure and to see the column headings.

In [None]:
df.head() # Peek at the first 5 rows

###### Access a column of data using a couple of methods

In [None]:
df.Rating

In [None]:
df['Rating']

###### Describe a column of the dataframe

In [None]:
df['Rating'].describe()

In [None]:
df['Moive Name'] # Yes, movie is misspelled by the creators of the dataset, this kind of things happens

### Section: Examine the Dataframe

###### Only display the quantitative summary data of the dataframe

In [None]:
df.describe()

###### Use kde (kernel density estimate) or histogram to visual the data's density
Notice the semicolon suppresses some display-related output

In [None]:
df.Rating.plot.hist()

In [None]:
df.Rating.plot.kde();

###### Descriptive Statistics

In [None]:
print(df.Rating.mean())
print(f'The mean Rating is {df.Rating.mean():2.4f}')

###### What are the columns?

In [None]:
df.columns

###### Location Function: grab some of the columns

In [None]:
df.loc[:,['Rating','Meta Score']]

###### What is the correlation between Rating and Meta Score?

In [None]:
df.loc[:,['Rating','Meta Score']].corr()

###### What are the top 10 highest rated movies

In [None]:
df.nlargest(10,'Rating')

###### What are the top 10 highest rated movies according to critics?

In [None]:
df.nlargest(10,'Meta Score')

[back-to-top](#Section:-Load-the-Dataframe)