<a href="https://colab.research.google.com/github/Lokeshpatnana/Pandas/blob/main/Intro_to_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd

# Pandas - Brief Intro

1. Data Analysis library which makes it easy to read and work with different types of data
 * Load large amounts of data into python
 * Work with different file formats (csv, sql, excel, etc.)
 * Work with time-series data (stock prices, sales data, etc.)

2. Cleaning the Data and Handling Missing Values
3. Built on top of NumPy and has great performance


> **Essential for everyone working with data (Data Science, ML, Analytics, etc.)**



# Walkthrough of Datasets

## Stack Overflow Dataset

With nearly 65,000 responses fielded from over 180 countries and dependent territories, the 2020 Annual Developer Survey by Stack Overflow examines all aspects of the developer experience from career satisfaction and job search to education and opinions on open source software.

**Source**: https://insights.stackoverflow.com/survey

In [None]:
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/survey_results_public.csv

## Covid Dataset

This dataset provides day-wise metric of COVID-19 in Italy for around 200 days.

**Source**: https://hub.jovian.ml/wp-content/uploads/2020/09/italy-covid-daywise.csv

In [None]:
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/italy-covid-daywise.csv

## Film Dataset


About 1000 movies with properties such as length, main actor and actress, director and popularity.

**Source**: https://perso.telecom-paristech.fr/eagan/class/igr204/datasets

In [None]:
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/film.csv

## eCommerce Dataset


This dataset provides shopping data related to various products and orders over the course of a year.

**Source**: https://github.com/KeithGalli/Pandas-Data-Science-Tasks

In [None]:
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/shopping_data_v2.csv

# Reading & Writing files

Reading/Loading the data from a csv file

In [None]:
shopping_df = pd.read_csv('shopping_data_v2.csv')

Writing the data to a csv file

In [None]:
shopping_df.to_csv('shopping_data_copy.csv')

# Dataframe
**A dataframe is like a dictionary of lists, but with much more functionality**

*   A table of data (Rows and Columns)
*   2 Dimensional Data Structure


## Properties of a Dataframe

**Shape of a dataframe**



In [None]:
shopping_df.shape

**`df.columns` returns the column labels of a DataFrame**


In [None]:
shopping_df.columns

**`df.dtypes` returns the datatypes of each column in a DataFrame**


In [None]:
shopping_df.dtypes

### df.head
* `df.head(n=5)`
  *   Returns the first `n` rows.
  * For negative values of `n`, this function returns all the rows except for the last `n` rows.


In [None]:
shopping_df.head()

In [None]:
shopping_df.head(10)

In [None]:
shopping_df.head(-1000)

### df.tail
* `df.tail(n=5)`
  *   Returns the last `n` rows.
  * For negative values of `n`, this function returns all rows except for the first `n` rows.

In [None]:
shopping_df.tail()

In [None]:
shopping_df.tail(15)

In [None]:
shopping_df.tail(-100)

### df.describe
* `df.describe(include=None, exclude=None)`
  * Generates descriptive statistics.
  * Analyzes both numeric and object series as well as mixed data types.
  * `include`: A list of data types to include in the result.
  * `exclude`:  A list of data types to omit from the result.

In [None]:
shopping_df.describe()

In [None]:
shopping_df.describe(include=[object])

In [None]:
shopping_df.describe(include="all")

In [None]:
shopping_df.describe(exclude=[object])

### df.info
* `df.info()`
  * Print a concise summary of a DataFrame including the dtypes of the columns, memory usage, etc.

In [None]:
shopping_df.info()

## Comparision with Dictionary of Lists

In [None]:
people = {
    "first": ["Kristen", 'Maxine', 'John'],
    "last": ["Carol", 'Willians', 'Smith'],
    "email": ["KristenC@gmail.com", 'Maxine.Williams@email.com', 'JohnSmith@email.com']
}

Creating DataFrame from a dictionary of lists

In [None]:
df = pd.DataFrame(people)
print(df)

In [None]:
people["first"]

In [None]:
df["last"]

In [None]:
type(df["first"])

# Series
**A Series is like a list of data, but with much more functionality**


Creating Series from a List of Data

In [None]:
pd.Series([1, 2, 3, 4, "asdf"])

In [None]:
type(shopping_df['Product'])

# Accessing Data

## Indexing

We can retrieve a specific value from a series using the indexing notation **`[]`**


In [None]:
shopping_df['Product'][200]

### df.iloc

**Accessing by position numbers**

* `df.iloc()`

  * Purely integer-location based indexing for selection by position.
  * Unlike in `loc`, if a slice object with indices is passed then **stop is excluded**

In [None]:
shopping_df.iloc[[0, 2], [0, 3]]

In [None]:
shopping_df.iloc[0, 1]

### df.loc

**Accessing by label / name**

* `df.loc()`

  * Access a group of rows and columns by label(s) or a boolean array
  * Allowed inputs are:
    * single label
    * list or array of labels
    * slice object with labels (**both start and stop are included**)
    * boolean array of the same length as the axis being sliced

In [None]:
shopping_df.loc[980]

In [None]:
shopping_df.loc[[980, 456]]

In [None]:
shopping_df.loc[[980, 456], ['Product', 'Order Date']]

In [None]:
shopping_df.loc[45, 'Product']

In [None]:
shopping_df[0:3]

### df.at
* `df.at()`
  * Access a single value for a row/column label pair.
  * Can also set/update a value at a specified row/column pair
  * Use `at` only if you need to get or set a single value in a DataFrame or Series.


In [None]:
shopping_df.at[50000, 'Product']

In [None]:
df_copy = shopping_df.copy()

In [None]:
print(df_copy.at[50000, 'Product'])
df_copy.at[50000, 'Product'] = "Micro USB Cable"
print(df_copy.at[50000, 'Product'], "\n")

print(shopping_df.at[50000, 'Product'])

## Slicing


In [None]:
shopping_df[0:3]

When slicing using **loc** the start and stop indices are **included**.

In [None]:
shopping_df.loc[7:9]

In [None]:
shopping_df.loc[23:45, 'Product']

In [None]:
shopping_df.loc[10:20, 'Product':'Order Date']

When slicing using **iloc** the stop index is **excluded**.

In [None]:
shopping_df.iloc[1:2, 0:3]

In [None]:
shopping_df.iloc[1, 0:3]

# Try It Yourself


For the following questions, use the **film** dataset.
0. Load the dataset into a dataframe using `read_csv`
1. Get the shape of the dataset.
2. Get the names of the first 6 films.
3. Generate descriptive statistics for the dataset, including all datatypes.
4. Change the 'Popularity' at the third index to 70.
5. Get the data in the first two columns for the last five rows in the dataset.