# Pandas Concepts #1/4

Please see the documentation:

* [https://pandas.pydata.org/pandas-docs/stable/reference/index.html](https://pandas.pydata.org/pandas-docs/stable/reference/index.html)

## Loading Pandas

In [1]:
import pandas as pd

## Creating a DataFrame

In [2]:
df = pd.DataFrame() ## empty

### From a list ...

In [4]:
data = [
 [1, 2, 3],
 [4, 5, 6]
]
df = pd.DataFrame(data)

In [5]:
df

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6


### ... or a Numpy array

In [6]:
import numpy as np

In [7]:
np.full((5,5), -1) # a 5x5 array fill with -1

array([[-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1],
       [-1, -1, -1, -1, -1]])

In [8]:
df = pd.DataFrame(np.full((5,5), -1))

In [9]:
df

Unnamed: 0,0,1,2,3,4
0,-1,-1,-1,-1,-1
1,-1,-1,-1,-1,-1
2,-1,-1,-1,-1,-1
3,-1,-1,-1,-1,-1
4,-1,-1,-1,-1,-1


**Q:** Why don't we just use Numpy for everything?

**A:** Because Pandas makes so many things so much easier! 

### ... from a CSV file

In [14]:
!wget https://github.com/richaude/blue-book/raw/master/scrubbed.csv

--2023-10-05 01:29:28--  https://github.com/richaude/blue-book/raw/master/scrubbed.csv
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/richaude/blue-book/master/scrubbed.csv [following]
--2023-10-05 01:29:28--  https://raw.githubusercontent.com/richaude/blue-book/master/scrubbed.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13929533 (13M) [text/plain]
Saving to: ‘scrubbed.csv’


2023-10-05 01:29:44 (161 MB/s) - ‘scrubbed.csv’ saved [13929533/13929533]



**Documentation** for `read_csv()`

* [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas-read-csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas-read-csv)

In [16]:
df = pd.read_csv("scrubbed.csv")

  df = pd.read_csv("scrubbed.csv")


In [17]:
df

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,45 minutes,This event took place in early fall around 194...,4/27/2004,29.8830556,-97.941111
1,10/10/1949 21:00,lackland afb,tx,us,light,7200,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.9783333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.4180556,-157.803611
...,...,...,...,...,...,...,...,...,...,...,...
80327,9/9/2013 21:15,nashville,tn,us,light,600.0,10 minutes,Round from the distance/slowly changing colors...,9/30/2013,36.165833,-86.784444
80328,9/9/2013 22:00,boise,id,us,circle,1200.0,20 minutes,Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r...,9/30/2013,43.613611,-116.202500
80329,9/9/2013 22:00,napa,ca,us,other,1200.0,hour,Napa UFO&#44,9/30/2013,38.297222,-122.284444
80330,9/9/2013 22:20,vienna,va,us,circle,5.0,5 seconds,Saw a five gold lit cicular craft moving fastl...,9/30/2013,38.901111,-77.265556


### ... from an Excel file 

In [10]:
!wget https://github.com/Subham2S/EDA-Car-Sales/raw/main/Car_Sales.xlsx

--2023-10-05 01:21:24--  https://github.com/Subham2S/EDA-Car-Sales/raw/main/Car_Sales.xlsx
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/Subham2S/EDA-Car-Sales/main/Car_Sales.xlsx [following]
--2023-10-05 01:21:24--  https://raw.githubusercontent.com/Subham2S/EDA-Car-Sales/main/Car_Sales.xlsx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 513439 (501K) [application/octet-stream]
Saving to: ‘Car_Sales.xlsx’


2023-10-05 01:21:25 (11.3 MB/s) - ‘Car_Sales.xlsx’ saved [513439/513439]



**Documentation** for `read_csv()`

* [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas-read-excel](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas-read-excel)

In [12]:
df = pd.read_excel("Car_Sales.xlsx")

In [13]:
df

Unnamed: 0,car,price,body,mileage,engV,engType,registration,year,model,drive
0,Ford,15500.0,crossover,68,2.5,Gas,yes,2010,Kuga,full
1,Mercedes-Benz,20500.0,sedan,173,1.8,Gas,yes,2011,E-Class,rear
2,Mercedes-Benz,35000.0,other,135,5.5,Petrol,yes,2008,CL 550,rear
3,Mercedes-Benz,17800.0,van,162,1.8,Diesel,yes,2012,B 180,front
4,Mercedes-Benz,33000.0,vagon,91,,Other,yes,2013,E-Class,
...,...,...,...,...,...,...,...,...,...,...
9571,Hyundai,14500.0,crossover,140,2.0,Gas,yes,2011,Tucson,front
9572,Volkswagen,2200.0,vagon,150,1.6,Petrol,yes,1986,Passat B2,front
9573,Mercedes-Benz,18500.0,crossover,180,3.5,Petrol,yes,2008,ML 350,full
9574,Lexus,16999.0,sedan,150,3.5,Gas,yes,2008,ES 350,front


### ... from a URL (CSV)

In [18]:
pd.read_csv("https://raw.githubusercontent.com/huangjia2019/house/master/house.csv")

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.40,19.0,7650.0,1901.0,1129.0,463.0,1.8200,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.9250,65500.0
...,...,...,...,...,...,...,...,...,...
16995,-124.26,40.58,52.0,2217.0,394.0,907.0,369.0,2.3571,111400.0
16996,-124.27,40.69,36.0,2349.0,528.0,1194.0,465.0,2.5179,79000.0
16997,-124.30,41.84,17.0,2677.0,531.0,1244.0,456.0,3.0313,103600.0
16998,-124.30,41.80,19.0,2672.0,552.0,1298.0,478.0,1.9797,85800.0


Go to the next tutorial:

* [./pandas_pt2_dataframe_operations.ipynb](./pandas_pt2_dataframe_operations.ipynb)

Or jump to the others:

* [./pandas_pt3_more_filtering.ipynb](./pandas_pt3_more_filtering.ipynb)
* [./pandas_pt4_basic_stats](./pandas_pt4_basic_stats.ipynb)

