Pandas is a software library in python that is often used by data analysts for data manipulation and analysis.
In this EDA, I will demonstrate some methods in Panda object by analyzing a dataset on historical stock prices of Tesla.

Below is the summary of methods that we will be looking at:
* Display: .head(), .tail()
* Data attributes: .index, .columns, .shape
* Select: .loc[], iloc[], conditional selection
* Summarize: .sum(), .mean(), .median(), .std()
* Sort: .sort_index(), sort_values().
* Split-apply-combine: .groupby()
* Frequency counts: .groupby().size(), .value_counts()
* Plots: .plot() of any type

First, we have to import the Pandas method and the data set that we will be working with.

In [None]:
import pandas as pd
stock = pd.read_csv('/kaggle/input/tesla-stock-data-from-2010-to-2020/TSLA.csv')

# Display

Let's check what the imported data looks like by displaying the data using .head() method or .tail().[](http://)

.head() displays the first 5 rows of data. You can choose the number of data displayed by putting in an inter in the brackets.

In [None]:
stock.head()

Similar to the .head() method, .tail() displays the bottow 5 rows of your data set.

In [None]:
stock.tail()

We can change the current index (0,1,2,...) to one of the columns in the dataset

In [None]:
stock = stock.set_index('Date')
stock.tail()

# Data attributes

Pandas are implemented with two main data classes: Series and Dataframes.

Series is a one-dimensional labeled array, essentially a column, while dataframe is a two-dimensional table made from a collection of series. 

The data we imported is a dataframe because it has more than one column. Dataframe looks like the data we displayed above using the .head() and the .tail() method.

Just for note, a series looks like this:

In [None]:
stock['Open']

In a dataframe, you can look up the list of indices and columns using ".index" and ".columns". .tolist() is added to view it in a readable form.

In [None]:
stock.index

In [None]:
stock.columns.tolist()

.shape method can be used to display the number of rows and the number of columns in the dataset.

In [None]:
stock.shape

The first number represents the number of rows and the latter represents the number of columns.

You can also choose to display one of either the number of rows or the number of columns.

In [None]:
stock.shape[0]

In [None]:
stock.shape[1]

# Select

In pandas, you can select data in a column or a row.

Recall that a dataframe is a collection of series. Thus, if you choose one column, you get an output as a series.

Selecting data in a single column:


In [None]:
stock['Open']

Selecting data for multiple column:

In [None]:
stock[['Open','Close']]

Selecting data in a row:

* .loc[] method is used for the name of the index (as a dictionary)
* .iloc[] method is used for the row number (as a list)


Similar to columns, selecting a single row will return a series data

In [None]:
stock.loc['2010-06-29']

In [None]:
stock.iloc[0]

Selecting multiple rows:

In [None]:
stock.loc[['2010-06-29','2010-06-30']]

In [None]:
stock.iloc[0:2]

# Summarize

In Pandas, you can summarize your data by using the following:

* .sum(): Adds all the selected values
* .mean(): Finds the average of the selected values
* .median(): Finds the median value of the selected values
* .std(): Finds the standard deviation of selected values

.head() is added for simplicity

In [None]:
stock.sum().head()

In [None]:
stock.mean().head()

In [None]:
stock.median().head()

In [None]:
stock.std().head()

# Sorting

In Pandas, you can sort your dataset in an ascending/descending order according to the values you want to sort.

Sorting values in a dataset:

In [None]:
stock.sort_values('Open',ascending = True)

In [None]:
stock.sort_index(ascending = True)

# Split-apply-combine

In Pandas, you can categorize each row according to a value in a row using .groupby() method. Since, our current dataset does not have a datapoint that can be categorized, I added a new category called 'Years' so that we can categorize based on the year of the stock price.

In [None]:
stock['Years'] = pd.DatetimeIndex(stock.index).year
stock['Months'] = pd.DatetimeIndex(stock.index).month
stock.head()

As an example you can find the average stock price of Tesla for each year by:
1. Splitting Tesla's stock prices according to the years
2. Combining the datasets using the summarize method 'mean'.

In [None]:
stock.groupby('Years').mean()

You can categorize with multiple categories by putting inputting a list of the columns.

In [None]:
stock.groupby(['Years','Months']).mean()

# Frequency Count

The frequency count will not show any valid information for this specific datasets, but it can be used to count the number of data in the categories. It can be useful, for example, in a dataset that records the occurrences of earthquakes if you want to find out where earthquakes occur the most.

This can be done using .size() or .value_counts() method. They basically are the same, but .size() sorts according to the index while .value_count sorts based on the value.

In [None]:
stock.groupby('Years').size()

In [None]:
stock.Years.value_counts()

# Plots

Once you have the data that you want, you can plot the data using different graphs.
Pandas offer a variety of graph types that you can plot. Here are some examples:

In [None]:
stock.plot.scatter(x = 'Years', y = 'Close')

In [None]:
# 1. Group the data in the 'Years' category
# 2. Select the category you want to work with
# 3. Summarize (find the mean) the data
# 4. Plot the data in a pie chart

stock.groupby('Years').Close.mean().plot.pie()