# Introduction

### Why to use Pandas library

- It's simple to use
- Integrated with many other data science and ML Python tools
- Helps you get your data ready for machine learning (properly formatted data)

### Series, Data Frames and csv files

In [1]:
import pandas as pd

#### Pandas has mainly 2 datatypes
- Series
- DataFrames

In [2]:
# Series (Pandas series takes a python list as a parameter )
series = pd.Series(["BMW", "Toyota", "Honda"])

In [3]:
# To view the Series data
series

0       BMW
1    Toyota
2     Honda
dtype: object

##### Note: Series is 1-dimensional data type of Pandas library

In [4]:
colours = pd.Series(["Red", "Blue", "White"])
colours

0      Red
1     Blue
2    White
dtype: object

##### Note: But DataFrame is 2-dimensional data type
- DataFrame is more commonly used data type of Pandas library as compared to Series data type.
- DataFrame takes a Python dictionary data type or we can make use of Series objects by putting those as a dictionary format to create DataFrame objects.

In [5]:
car_data = pd.DataFrame({"Car Make": series, "Colour": colours})
car_data

Unnamed: 0,Car Make,Colour
0,BMW,Red
1,Toyota,Blue
2,Honda,White


Note:- Creating datframe objects manually is very tedious and time consuming, So we can import external files mainly in a .csv format into Jupyter notebbok workspace as dataframes.

In [6]:
# import data
car_sales = pd.read_csv("..\Data Files\car-sales.csv")

In [7]:
car_sales

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
5,Toyota,Green,99213,4,"$4,500.00"
6,Honda,Blue,45698,4,"$7,500.00"
7,Honda,Blue,54738,4,"$7,000.00"
8,Toyota,White,60000,4,"$6,250.00"
9,Nissan,White,31600,4,"$9,700.00"


# Anatomy of Pandas DataFrame

Now `car_sales` and `df` contain the exact same information, the only difference is the name. Like any other variable, you can name your `DataFrame`'s whatever you want. But best to choose something simple.

### Anatomy of a DataFrame

Different functions use different labels for different things. This graphic sums up some of the main components of `DataFrame`'s and their different names.

<img src="../images/pandas-anatomy-of-a-dataframe.png" alt="pandas dataframe with different sections labelled" width="800" />


- Like python Lists, Pandas dataframe index number starts at 0
- Pandas dataframe has rows and columns similar to a table format
- A row is referred to as axis = 0
- A column is referred to as axis = 1

In [10]:
car_sales.to_csv("..\Data Files\exported_car_sales.csv", index=False)
# car_sales.to_excel("exported_car_sales_1.xlsx")

In [11]:
exported_car_sales_test = pd.read_csv("..\Data Files\exported_car_sales.csv")
exported_car_sales_test

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
5,Toyota,Green,99213,4,"$4,500.00"
6,Honda,Blue,45698,4,"$7,500.00"
7,Honda,Blue,54738,4,"$7,000.00"
8,Toyota,White,60000,4,"$6,250.00"
9,Nissan,White,31600,4,"$9,700.00"


# Reading data files from an url

We can also read/import a data file directly from the web or any web repository as below

heart_disease = pd.read_csv("https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/data/heart-disease.csv")

Note: If you're using a link from GitHub, make sure it's in the "raw" format, by clicking the raw button.