## DATAFRAME data structure (2D)
DataFrame is a 2 dimensional pandas data structure used to store raw data as rows and columns. With both, rows and columns are labeled. It's like an advanced array of object or an advanced spread sheet.
DataFrame allows you to store a Dictionary of Series object. Docs about Series can be accessed here (https://pandas.pydata.org/docs/reference/frame.html)

In [1]:
import pandas as pd;

### Using DataFrame object

In [33]:
# Creating a dataframe from a dictionary (remember all arrays must be of the same lenght)
myDict = { 'Name': ['Janvier', 'Kate', 'Luna'], 'Age': [28.5, 27.5, 4.5], 'Height': [5.4, 5.3, 0.8]}
mydataframe = pd.DataFrame(myDict)
mydataframe

Unnamed: 0,Name,Age,Height
0,Janvier,28.5,5.4
1,Kate,27.5,5.3
2,Luna,4.5,0.8


Dataframe is an object and contains multiple build in methods that can be use to access and manipulate it data

In [3]:
# Accessing the index
mydataframe.index

RangeIndex(start=0, stop=3, step=1)

In [4]:
# Accessing columns
mydataframe.columns

Index(['Name', 'Age', 'Height'], dtype='object')

In [36]:
# Columns are also call axes
mydataframe.axes

[RangeIndex(start=0, stop=3, step=1),
 Index(['Name', 'Age', 'Height'], dtype='object')]

In [5]:
# dataFrame data type
type(mydataframe)

pandas.core.frame.DataFrame

In [6]:
# dataFrame column data type
type(mydataframe.Name)

pandas.core.series.Series

## Data Selection in a DataFrame
Two main options to select Rows and Columns using;
-  Integer position
- Labels
- Dot Notation

In [7]:
# Creating a dataframe with label
dict1 = { 'Name': ['Janvier', 'Kate', 'Luna', 'Naomie'], 'Age': [28.5, 27.5, 4.5, 11], 'Height': [5.4, 5.3, 0.8, 5.3]}
df = pd.DataFrame(dict1, index=['a', 'b', 'c', 'd'])
df

Unnamed: 0,Name,Age,Height
a,Janvier,28.5,5.4
b,Kate,27.5,5.3
c,Luna,4.5,0.8
d,Naomie,11.0,5.3


#### Selecting DataFrame Columns

1. Using Label

In [8]:
# Getting a single column from a dataFrame
df['Name']

a    Janvier
b       Kate
c       Luna
d     Naomie
Name: Name, dtype: object

In [9]:
# Getting multiple columns from a dataFrame
df[['Name', 'Age']]

Unnamed: 0,Name,Age
a,Janvier,28.5
b,Kate,27.5
c,Luna,4.5
d,Naomie,11.0


2. Using Dot Notation

In [10]:
df.Name

a    Janvier
b       Kate
c       Luna
d     Naomie
Name: Name, dtype: object

In [11]:
df.Age

a    28.5
b    27.5
c     4.5
d    11.0
Name: Age, dtype: float64

#### Selecting DataFrame Rows

 1. ##### Using integer position

In [12]:
df[:1]

Unnamed: 0,Name,Age,Height
a,Janvier,28.5,5.4


In [13]:
df[2:4]

Unnamed: 0,Name,Age,Height
c,Luna,4.5,0.8
d,Naomie,11.0,5.3


2. ##### Using Label

In [14]:
df[:'a']

Unnamed: 0,Name,Age,Height
a,Janvier,28.5,5.4


In [15]:
df['b':'d']

Unnamed: 0,Name,Age,Height
b,Kate,27.5,5.3
c,Luna,4.5,0.8
d,Naomie,11.0,5.3


### Selecting DataFrame Rows and Columns

1. Using integer position

In [16]:
# Using row position
df[['Name', 'Height']][:2]

Unnamed: 0,Name,Height
a,Janvier,5.4
b,Kate,5.3


In [17]:
# Using row position
df[['Name', 'Age']][2:4]

Unnamed: 0,Name,Age
c,Luna,4.5
d,Naomie,11.0


2. Using Label

In [18]:
# Using row Label 
df[['Name', 'Age']][:'d']

Unnamed: 0,Name,Age
a,Janvier,28.5
b,Kate,27.5
c,Luna,4.5
d,Naomie,11.0


In [19]:
# Using row Label 
df[['Name', 'Age']]['c':'d']

Unnamed: 0,Name,Age
c,Luna,4.5
d,Naomie,11.0


3. Using Loc method

In [20]:
# To get just One column with All Rows 
df.loc[ : , 'Name']

a    Janvier
b       Kate
c       Luna
d     Naomie
Name: Name, dtype: object

In [21]:
# To Get multiple Column with All Rows 
df.loc[ : , ['Name', 'Age']]

Unnamed: 0,Name,Age
a,Janvier,28.5
b,Kate,27.5
c,Luna,4.5
d,Naomie,11.0


In [22]:
# To Get multiple Column with Selected Rows 
df.loc[['a', 'b'], ['Name', 'Height']]

Unnamed: 0,Name,Height
a,Janvier,5.4
b,Kate,5.3


In [23]:
# To Get multiple Column with Selected Rows (using slice notation)
df.loc['a':'c', ['Name', 'Height']]

Unnamed: 0,Name,Height
a,Janvier,5.4
b,Kate,5.3
c,Luna,0.8


4. Using iLoc Method

In [24]:
# To get Row 1 (using slice notation)
df.iloc[1, :]

Name      Kate
Age       27.5
Height     5.3
Name: b, dtype: object

In [25]:
# To get Row 2 or a specific Row (using slice notation)
df.iloc[2, :]

Name      Luna
Age        4.5
Height     0.8
Name: c, dtype: object

In [26]:
# To get a range of Rows (using slice notation)
df.iloc[1:3, :]

Unnamed: 0,Name,Age,Height
b,Kate,27.5,5.3
c,Luna,4.5,0.8


In [27]:
# To get Columns (using position integer with slice notation)
df.iloc[:, 0:2]

Unnamed: 0,Name,Age
a,Janvier,28.5
b,Kate,27.5
c,Luna,4.5
d,Naomie,11.0


In [28]:
# To get specific Rows (using position integer with slice notation)
df.iloc[[1, 2, 3], :]

Unnamed: 0,Name,Age,Height
b,Kate,27.5,5.3
c,Luna,4.5,0.8
d,Naomie,11.0,5.3


In [29]:
# To get specific Rows and Columns (using position integer with slice notation)
df.iloc[[1, 2, 3], 0:2]

Unnamed: 0,Name,Age
b,Kate,27.5
c,Luna,4.5
d,Naomie,11.0


In [30]:
# To get a range of Rows and Columns (using position integer with slice notation)
df.iloc[0:3, 0:2]

Unnamed: 0,Name,Age
a,Janvier,28.5
b,Kate,27.5
c,Luna,4.5
