# DataFrame
---
- 2 Dimensional : data is aligned in a tabular fashion in rows and columns.
- Potentially columns are of different types
- Size – Mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns

## DataFrame Constructor 
```pandas.DataFrame( data, index, columns, dtype, copy)```  
where 

- data : data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame
- index : For the row labels, the Index to be used for the resulting frame is Optional. Default np.arange(n) if no index is passed.
- columns : For column labels, the optional default syntax is - np.arange(n). This is only true if no index is passed.
- dtype : Data type of each column.
- copy : used for copying of data, default is False.

## Create an Empty DataFrame
 
 

In [2]:
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


## Create a DataFrame from Lists

In [3]:
data = [1,2,3,4,5]
dfl = pd.DataFrame(data)
print(dfl)

   0
0  1
1  2
2  3
3  4
4  5


In [4]:
data = [['Falano0',21],['Falano1',22],['Falano3',25]]
dfl1 = pd.DataFrame(data,columns=['Name','Age'])
print(dfl1)

      Name  Age
0  Falano0   21
1  Falano1   22
2  Falano3   25


## Create a DataFrame from Dictionary

In [5]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
dfd = pd.DataFrame(data)
print(dfd)

   a   b     c
0  1   2   NaN
1  5  10  20.0


In [6]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
dfd1 = pd.DataFrame(data, index=['first', 'second'])
print(dfd1)

        a   b     c
first   1   2   NaN
second  5  10  20.0


In [7]:
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
print(df1)

        a   b
first   1   2
second  5  10


## Create a DataFrame from Series 


In [8]:
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
dfs = pd.DataFrame(d)
print(dfs)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


## Column Selection
---

In [9]:
dfs['two']

a    1
b    2
c    3
d    4
Name: two, dtype: int64

In [25]:
#Add a column
dfs['three']=pd.Series([10,20,30],index=['a','b','c'],dtype=int)


In [22]:
print(dfs)


   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN


In [26]:
#remove a column
dfs.pop('three')

a    10.0
b    20.0
c    30.0
d     NaN
Name: three, dtype: float64

In [21]:
del dfs['three']

## Row Selection, Addition, and Deletion

### Selection by location

In [22]:
print(dfs.loc['b'])
#The result is a series with labels as column names of the DataFrame. 
#And, the Name of the series is the label with which it is retrieved.

one    2.0
two    2.0
Name: b, dtype: float64


### Selection by integer location

In [24]:
print(dfs)
print(dfs.iloc[2])

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
one    3.0
two    3.0
Name: c, dtype: float64


In [26]:
#slicing
print(dfs[:2])

   one  two
a  1.0    1
b  2.0    2


### Addition of Rows

In [28]:
dfr = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

dfs = dfs.append(dfr)
print(dfs)

   one  two    a    b
a  1.0  1.0  NaN  NaN
b  2.0  2.0  NaN  NaN
c  3.0  3.0  NaN  NaN
d  NaN  4.0  NaN  NaN
0  NaN  NaN  5.0  6.0
1  NaN  NaN  7.0  8.0


### Drop rows


In [29]:
dfs.drop(0)

Unnamed: 0,one,two,a,b
a,1.0,1.0,,
b,2.0,2.0,,
c,3.0,3.0,,
d,,4.0,,
1,,,7.0,8.0


In [27]:
dfs.drop?

In [30]:
dfs

Unnamed: 0,one,two,a,b
a,1.0,1.0,,
b,2.0,2.0,,
c,3.0,3.0,,
d,,4.0,,
0,,,5.0,6.0
1,,,7.0,8.0


# Import CSV In pandas
---
```pandas.read_csv```

In [31]:
pd.read_csv?

In [33]:
covid_cases = pd.read_csv("owid-covid-data.csv")

In [41]:
print(type(covid_cases))
# covid_cases.loc?
# covid_cases.head(20)
# covid_cases.tail(5)
covid_cases.loc[:2,"location"]

<class 'pandas.core.frame.DataFrame'>


0    Aruba
1    Aruba
2    Aruba
Name: location, dtype: object

## SQL Select in Pandas

In [57]:
# select iso_code, continent, location from tables where rownum <= 5
covid_cases[['iso_code','continent', 'location']].head(5)

Unnamed: 0,iso_code,continent,location
0,ABW,North America,Aruba
1,ABW,North America,Aruba
2,ABW,North America,Aruba
3,ABW,North America,Aruba
4,ABW,North America,Aruba


In [44]:
# select iso_code, continent, location from tables where location="International" and rownum <= 5
# covid_cases["location"]=="International"
# covid_cases[covid_cases["location"]=="International"]
covid_cases[covid_cases["location"]=="International"][['iso_code','continent', 'location']].head(5)


Unnamed: 0,iso_code,continent,location
40923,,,International
40924,,,International
40925,,,International
40926,,,International
40927,,,International


### Group By

In [67]:
#select location, count(*) from table group by location
covid_cases.groupby('location').size()

location
Afghanistan            247
Albania                178
Algeria                247
Andorra                179
Angola                 165
Anguilla               160
Antigua and Barbuda    167
Argentina              205
Armenia                247
Aruba                  169
dtype: int64

In [45]:
covid_cases.shape

(41170, 40)