# **Pandas**

## What is Pandas?

Pandas is a Python library <ins>used for data manipulation and analysis</ins>, providing data structures like DataFrames that allow for easy handling of structured data such as tables.


## Why Pandas?

- Simple to use
- Integrated with many other data science & ML Python tools such as NumPy, Matplotlib, Scikit-learn, TensorFlow, and Seaborn etc.
- Helps to prepare data for machine learning.


## What is Raw Data?

Raw data refers to <ins>unprocessed or unrefined data</ins> collected (just gathered data) from various sources, which hasn't been cleaned, organized, or analyzed yet.

<br/>

In [1]:
import pandas as pd

## 01 - Main DataTypes

1. Series (1-dimensional)
2. DataFrames (2-dimensional)

In [4]:
series = pd.Series(["BMW", "Toyota", "Honda"])
series

0       BMW
1    Toyota
2     Honda
dtype: object

<br/>

In [6]:
car_data = pd.DataFrame({"Car make": series, "Colour": ["Red", "Blue", "White"]})
car_data

Unnamed: 0,Car make,Colour
0,BMW,Red
1,Toyota,Blue
2,Honda,White


<br/>

## 02 - Pandas for imported data

In [7]:
# Import Data
car_sales_df = pd.read_csv('car-sales.csv');
car_sales_df

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
5,Toyota,Green,99213,4,"$4,500.00"
6,Honda,Blue,45698,4,"$7,500.00"
7,Honda,Blue,54738,4,"$7,000.00"
8,Toyota,White,60000,4,"$6,250.00"
9,Nissan,White,31600,4,"$9,700.00"


<br/>

## 03 - Exporting a dataframe

In [9]:
car_sales_df.to_csv("exported-car-sales.csv")

In [13]:
exported_car_sales_df = pd.read_csv("exported-car-sales.csv")
exported_car_sales_df[:5]

Unnamed: 0.1,Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,0,Toyota,White,150043,4,"$4,000.00"
1,1,Honda,Red,87899,4,"$5,000.00"
2,2,Toyota,Blue,32549,3,"$7,000.00"
3,3,BMW,Black,11179,5,"$22,000.00"
4,4,Nissan,White,213095,4,"$3,500.00"


> There is a undefined column named "Unnamed:0", that column known as 'index' column. It should be remove when exporting a dataframe to a CSV file

In [14]:
# Removing 'index' column when exporting a CSV file
car_sales_df.to_csv("exported-car-sales.csv", index=False)

In [15]:
exported_car_sales_df = pd.read_csv("exported-car-sales.csv")
exported_car_sales_df[:5]

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
