___

<a href='https://www.instagram.com/lanlearning/'> <img src='../pimages/logosmall.png' width="100" height="100"/></a>
___
<center>Copyright LanLearning 2020</center>





# Welcome back to Pandas! Part 2!

In this notebook, we will be working with a simple DataFrame and be doing a few operations on it just to get a feel for what our DataFrame is like. Bascially, we will try to use operations to describe the frame.

# Exploratory Data Analysis (EDA)

In [None]:
# first thing's first,

import pandas as pd

## About the Data:
Just to give you a feel for the different domains within data science, we will use various types of datasets. 

Today's dataset will be related to basketball. The file, included in the folder, is ```players.csv``` which was taken from https://www.basketball-reference.com. 

Let's get into it.

In [None]:
players = pd.read_csv('players.csv')
players.head(10)

## What are rows and columns:
Before diving into some of the pandas functions that will be used to describe our frame, let's solidify our basic understanding of rows and columns.

#### What does one row represent?
<br>
<img src='../pimages/kobe.jpeg' width="500"/>
<br>

One row in our ```players``` frame represents one player. There is one and only one individual per row.

In general, one row represents one **instance** or **individual** who we will be analyzing/studying. 

For example: In the coronavirus dataset from the last notebook, one row represented one single day. 

#### What do columns represent? 
In our ```players``` table we have a different columnn for some different fact about the player. For example, one column represents their ```Age```, while another represents their position ```Pos```. 

Columns represent **something** or **some attribute** about each of the individuals in our dataset. This is why we call our columns **attributes** or **features** of the individuals in our dataset. 

For example, from the coronavirus dataset from before, there was a feature that represented the number of ```Positive``` test cases of COVID in a particular day. 

## How Many... ?

Let's take a look at how many instances and features we have in our dataframe:

You can use the ```len()``` function and simply pass in the dataframe to find the number of rows there are:

In [None]:
len(players)

This tells us that there are ```624``` players in the dataframe.

Next let's find the number of features (or attributes in our table). First we will need to get the list of column titles and then pass it into ```len()```.

In [None]:
players.columns

```.columns``` will tell you all the columns that are in the dataframe. 

Let's pass that into ```len()```: 

In [None]:
len(players.columns)

This tells us there are 30 columns in our DataFrame.

You can also access the number of rows and columns the frame has by using `.shape`.

In [None]:
players.shape # see! the dimensions are the same as the one stated above

`.shape` can also do the same functions as `len()` and `.columns`.

In [None]:
players.shape[0]

In [None]:
players.shape[1]

## Accessing Different Parts of the DataFrame:
You can also see certain parts of a DataFrame by using other methods:
- ```.index```, will show you the different row labels (aka the bolded numbers all the way to the left)
- ```.values```, will show you all the data in the table.

Let's take a look:

In [None]:
players.index

This actually didn't return anything, but it gives us a description of our index. 

It **starts** at index ```0```, and **ends** at ```624``` and **increments** by ```1```.

but if you want to see the actual values, you need to pass that into ```list()```:

In [None]:
list(players.index)

In [None]:
players.values # this data can not be manipulated, only a view of what's in the DataFrame

*Note*: ```.columns```, ```.index```, and ```.values``` look a lot like functions. 

**However, they are not.**

They do not have parantheses, which is an extremely important observation to make. ```.index```, ```.columns```, and ```.values``` access the attributes of the ```player``` dataframe object. 

**Attributes** are an important part of what makes Python an object oriented language, but we won't really get into that. But feel free to do as much research as you want about what makes Python an object oriented language.

On a side note, this function (note the parentheses) does something similar to `.values`, but with more detail on the type of each column rather than the values.

In [None]:
players.info()

## Indexing with DataFrames:

Let's learn how we can select certain rows and columns from our DataFrame. 

### Indexing to select rows: 
Since you select a row, you are selecting a player from the dataframe. So think of selecting a few rows as selecting a group of players, say for your team. 

<br>
<img src='../pimages/team.jpg' width="500"/>
<br>

Now there are many ways to select rows from a dataframe. 

### ```.iloc```:
```.iloc[]``` is used to select rows in a dataframe. I like to think of this as index location. 

Index location is basically what it sounds like. You will specify the index of the row you want to select: 

In [None]:
# selecting the row in position 0, aka the first row
players.iloc[0]

In [None]:
# selecting the row in position 100, aka the 101th row
players.iloc[100]

#### Selecting multiple rows:

In [None]:
# slicing them to select rows at position 0 to 99
players.iloc[0:100]

In [None]:
# slicing by specifying particular positions of rows:
players.iloc[[0, 10, 200, 300]]

### ```.loc```:
```.loc[]``` is used to select rows in a dataframe based on the label of the row. In the dataframe we're using, the labels (which line the leftmost side of the dataframe and are in bold) happen to be the **same** as the rows' indexes, which makes it kind of confusing. The numbers that are on the leftmost side of the frame are **labels**. Not indexes. They are labels (which happen to mimic the indexes). 

If we want to select the player whose label says ```99```, we use ```.loc``` for that:

In [None]:
players.loc[99]

#### Everything else functions similarly as iloc.

##### Note: This particular file ```players.csv``` is a bad example for showing the difference for .loc and .iloc, but we'll make note of that in the future. 

For clarification, imagine you could shuffle the rows (which includes the row numbers, representing their lablels, on the left hand side). `.loc` will give you the row according to the number label, and `.iloc` will give you the row relative to the current order they are listed (based on index). But in your typical kind of DataFrame, `players.iloc[0]` would give the same thing as `players.loc[0]`.

## Selecting Columns:
There are two ways to select columns, but I'm just going to describe the recommended way. That way, you get in the habit of selecting columns in the most practical way:

You simply want to use bracket notation like you did back when you were using dictionaries, and pass in the ```name``` of the column as a ```str```:

In [None]:
players['Pos']

#### To select multiple columns, pass in a list of column titles:

In [None]:
players[['Player', 'Pos', 'AST', 'PTS']]

# Series vs DataFrame:

In [None]:
players['Player']

Look at ^^^ the shape of this dataframe and notice it **should be different** because it's only a part of the original dataframe. 

If you notice, when you select a single row or single column from the DataFrame, it doesn't look like a DataFrame; it's just a bunch of numbers. 

This is what a **series** is. It's just a *series* of numbers in one line! When you have multiple series altogether, it is called a DataFrame!

## Recap

**Dataframe Manipulation**
- You can access different parts of a frame using .index, .columns, and .values
    - .index displays labels of each row in frame
    - .column displays labels of each column in frame
    - .values displays data in frame (excluding the labels)
    - .info() displays the type of data in each column
- You can select columns with brackets.
- You can select rows using either ```.loc``` or ```.iloc```. ```.loc``` is based off of the row's label, and ```.iloc``` is based off of the row's index

### About this notebook: 
#### Developed by:
* [Milan Butani](https://www.linkedin.com/in/milanbutani/) 
* [Kyra Yee](https://www.linkedin.com/in/kyrayee/)
* [Jacqueline Mei](https://www.linkedin.com/in/jacqueline-mei-9140401aa/)
* [Liam McDonough](https://www.linkedin.com/in/liammmcdonough/)

#### Connect with us:
<a href='https://www.linkedin.com/company/lanlearning/'> <img src=https://img.icons8.com/color/48/000000/linkedin.png width="48" height="48" align="left"/></a>

<a href='http://www.instagram.com/lanlearning'> <img src=https://img.icons8.com/fluent/48/000000/instagram-new.png width="48" height="48" align="left"/></a>

<a href='https://www.youtube.com/channel/UC5_yxU9pz4ka7xITJMxO5WA'> <img src=https://img.icons8.com/color/48/000000/youtube-squared.png width="48" height="48" align="left"/></a>

<a href='https://www.github.com/lanlearning/'> <img src=https://img.icons8.com/material-rounded/48/000000/github.png/ width="48" height="48" align="left"/></a>

