# Dataframes with Pandas
---

DataFrames  allow us to use grids just like we would conventional spreadsheets. They give us labelled columns and rows, filtering, and many more tools to get the most insight and ease of use from our data.

In [1]:
import pandas as pd
import numpy as np

The table below has a scout report on four different players' shooting, passing, and defending skills.

In [2]:
player_list = ['Pagbo', 'Grazemen','Cantay','Ravane']
skill_list = ['Shooting','Passing','Defending']

# In this example, we have a random number generator for our scout. Don't use it for an actual team.

scores_array = np.random.randint(1,10,(4,3))

df = pd.DataFrame(data=scores_array, index=player_list, columns=skill_list)

print(df)

          Shooting  Passing  Defending
Pagbo            6        7          4
Grazemen         4        8          9
Cantay           3        6          3
Ravane           3        5          9


DataFrame needs three arguments to be fully labelled.

- Data is the values that make up the body
- Index goes along the y axis and is the name of each row
- Columns run along the x axis to name the columns

There are other ways to create dataframes, but this will be fine for now.

## Selecting and indexing

Use square brackets to select.

In [3]:
print(df['Shooting'])

Pagbo       6
Grazemen    4
Cantay      3
Ravane      3
Name: Shooting, dtype: int64


For rows, use `.loc` if you select with a name.

In [4]:
print(df.loc['Pagbo'])

Shooting     6
Passing      7
Defending    4
Name: Pagbo, dtype: int64


If you use an index number, use `.iloc`.

In [5]:
print(df.iloc[1:3])

          Shooting  Passing  Defending
Grazemen         4        8          9
Cantay           3        6          3


## Creating and removing columns/rows

DataFrames make it really easy for us to be flexible witho ur datasets. We can create new columns.

In [6]:
df['Communication'] = np.random.randint(1, 10, 4)
print(df)

          Shooting  Passing  Defending  Communication
Pagbo            6        7          4              9
Grazemen         4        8          9              9
Cantay           3        6          3              8
Ravane           3        5          9              8


To add a new column, refer to it with square brackets, give it a new name, and then fill it with a series. In this case still, we're using random numbers.

To delete a column from a report, use `.drop`.

In [7]:
# axis=1 refers to columns
df = df.drop('Defending', axis=1)
print(df)

          Shooting  Passing  Communication
Pagbo            6        7              9
Grazemen         4        8              9
Cantay           3        6              8
Ravane           3        5              8


To `.drop` rows, set `axis=0`.

In [8]:
df = df.drop('Grazemen', axis=0)
print(df)

        Shooting  Passing  Communication
Pagbo          6        7              9
Cantay         3        6              8
Ravane         3        5              8


To add a new row, refer to it with `.loc`, and then give it a list or series of values. Again for this purpose, we're using random numbers.

In [9]:
df.loc['Gomez'] = np.random.randint(1,10,3)
print(df)

        Shooting  Passing  Communication
Pagbo          6        7              9
Cantay         3        6              8
Ravane         3        5              8
Gomez          1        2              2


## Conditional Selection

In our series, we used a true or false condition to select the data that we wanted to see. We use the exact same logic here.

In [10]:
print(df>5)

        Shooting  Passing  Communication
Pagbo       True     True           True
Cantay     False     True           True
Ravane     False    False           True
Gomez      False    False          False


This dataframe above returns true or false values. Just like with series, we can use these booleans to return a DataFrame according to our criteria.

You can also apply it to a specific column (which we already know is just a series).

In [11]:
print(df['Shooting']>5)

Pagbo      True
Cantay    False
Ravane    False
Gomez     False
Name: Shooting, dtype: bool


As expected, we have a series of boolean values. If we use square brackets to select our dataframe using these, we get a filtered dataframe.

In [12]:
print(df[df['Shooting']>3])

       Shooting  Passing  Communication
Pagbo         6        7              9
