# 2- DataFrames and Series basics

DataFrames are the main utility of the Pandas library. DataFrames are in their essence tables, they are organized sets of columns and rows. The keys of a specific Data Frame are its columns and values are rows. They are two-dimensional data structures.

Series are very much like DF, in fact when a column is accessed in a DF, it returns a Series. Series are one-dimensional data structures that can hold 1 type of data at a time. They are one dimensional arrays.


In [32]:
# Importing pandas
import pandas as pd

Data frames are much like dicts in python. They have their own keys and values in columns and rows specifically.


In [None]:
# Creating people dict
people = {
    'first': ['John', 'Paul', 'George', 'Ringo'],
    'last': ['Lennon', 'McCartney', 'Harrison', 'Starr'],
    'birthyear': [1940, 1942, 1943, 1940],
    'email': [
        'john.lennon@email.com',
        'paul.mccartney@email.com',
        'george.harrison@email.com',
        'ringo.starr@email.com',
    ],
}

In [None]:
# Printing the emails key
people['email']

['john.lennon@email.com',
 'paul.mccartney@email.com',
 'george.harrison@email.com',
 'ringo.starr@email.com']

In [None]:
# Converting dict into a DF
df = pd.DataFrame(people)

In [None]:
# Showing DF
df

Unnamed: 0,first,last,birthyear,email
0,John,Lennon,1940,john.lennon@email.com
1,Paul,McCartney,1942,paul.mccartney@email.com
2,George,Harrison,1943,george.harrison@email.com
3,Ringo,Starr,1940,ringo.starr@email.com


## Accessing data in DF

As previously said, when you access a specific column in a DF, pandas returns a Series object. You can access a specific column in a DataFrame by either using `[]` or with a dot-notation. The latter is not as reliable, mainly if the column name is also the name of a `pd.DataFrame` method/attribute.


In [23]:
df['email']

0        john.lennon@email.com
1     paul.mccartney@email.com
2    george.harrison@email.com
3        ringo.starr@email.com
Name: email, dtype: object

In [24]:
df.email

0        john.lennon@email.com
1     paul.mccartney@email.com
2    george.harrison@email.com
3        ringo.starr@email.com
Name: email, dtype: object

You can show all of the DF's columns by using the `.columns` attribute.


In [25]:
df.columns

Index(['first', 'last', 'birthyear', 'email'], dtype='object')

## `iloc` and `loc`

Although you can use traditional bracket notation to select specific columns in a DF, you cannot do the same for rows.

To do this, you can use either `iloc` or `loc`.

### `iloc`

`iloc` allows accessing rows by integer location. You can also pass a list of rows as indexes to select multiple at a time. Conversely, this can be done to select specific row(s) in specified column(s), by simply adding indexes for columns separated by a comma from the row(s) index(es).

The peculiarity of `iloc` is that its based on the current position of specific rows in the DF. It doesn't support accessing columns by their names.

### `loc`

`loc` uses labels instead of the data's position at the time of access. Instead of only using integers for locating data in columns, you can use the name of the column to access it.

Contrary to `iloc` this method doesn't follow the current position, instead relying on the row's label.


In [26]:
df.iloc[[0, 1], 2]

0    1940
1    1942
Name: birthyear, dtype: int64

In [27]:
df.loc[[0, 1], 'email']

0       john.lennon@email.com
1    paul.mccartney@email.com
Name: email, dtype: object

### Slicing

You can use both methods for slicing DFs. Slicing works very much like other structures in python, such as lists or tuples. When slicing by using `loc`, you have inclusive values when slicing columns.


In [28]:
df.loc[0:2, 'first']

0      John
1      Paul
2    George
Name: first, dtype: object

In [31]:
df.loc[0:2, 'first':'birthyear']

Unnamed: 0,first,last,birthyear
0,John,Lennon,1940
1,Paul,McCartney,1942
2,George,Harrison,1943
