# Dunder Data Challenge 004 - Finding the Date of the Largest Percentage Stock Price Drop 

In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, find the date where it had its largest one-day percentage loss. The data is found in the `stocks10.csv` file with each stocks ticker symbol as a column name.

In [1]:
import pandas as pd
stocks = pd.read_csv('../data/stocks10.csv')
stocks.head()

Unnamed: 0,date,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
0,1999-10-25,29.84,2.32,17.02,82.75,,21.45,38.99,16.78,,
1,1999-10-26,29.82,2.34,16.65,81.25,,20.89,37.11,17.28,,
2,1999-10-27,29.33,2.38,16.52,75.94,,20.8,36.94,18.27,,
3,1999-10-28,29.01,2.43,16.59,71.0,,21.19,38.85,19.79,,
4,1999-10-29,29.88,2.5,17.21,70.62,,21.47,39.25,20.0,,


### Challenge

There is a nice, fast solution that uses just a minimal amount of code without any loops. Can you return a Series that has the ticker symbols in the index and the date where the largest percentage price drop happened as the values.

#### Extra challenge

Can you return a DataFrame with the ticker symbol as the columns and a row for the date and another row for the percentage price drop?

## Solution

To begin, we need to find the percentage drop for each stock for each day. pandas has a built-in method for this called `pct_change`. By default, it finds the percentage change between the current value and the one immediately above it. Like most DataFrame methods, it treats each column independently from the others. 

If we call it on our current DataFrame, we'll get an error as it will not work on our date column. Let's re-read in the data, converting the date column to a datetime and place it in the index.

In [2]:
stocks = pd.read_csv('../data/stocks10.csv', parse_dates=['date'], index_col='date')
stocks.head()

Unnamed: 0_level_0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1999-10-25,29.84,2.32,17.02,82.75,,21.45,38.99,16.78,,
1999-10-26,29.82,2.34,16.65,81.25,,20.89,37.11,17.28,,
1999-10-27,29.33,2.38,16.52,75.94,,20.8,36.94,18.27,,
1999-10-28,29.01,2.43,16.59,71.0,,21.19,38.85,19.79,,
1999-10-29,29.88,2.5,17.21,70.62,,21.47,39.25,20.0,,


Placing the date column in the index is a key part of this challenge that makes our solution quite a bit nicer. Let's now call the `pct_change` method to get the percentage change for each trading day.

In [3]:
stocks.pct_change().head()

Unnamed: 0_level_0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1999-10-25,,,,,,,,,,
1999-10-26,-0.00067,0.008621,-0.021739,-0.018127,,-0.026107,-0.048217,0.029797,,
1999-10-27,-0.016432,0.017094,-0.007808,-0.065354,,-0.004308,-0.004581,0.057292,,
1999-10-28,-0.01091,0.021008,0.004237,-0.065051,,0.01875,0.051705,0.083196,,
1999-10-29,0.02999,0.028807,0.037372,-0.005352,,0.013214,0.010296,0.010611,,


Let's verify that one of the calculated values is what we desire. MSFT dropped 2 cents from 29.84 to 29.82 on its second trading day in this dataset. The percentage calculated below equals the percentage calculated in the method above.

In [4]:
(29.82 - 29.84) / 29.82

-0.0006706908115358676

Most pandas users know how to get the maximum and minimum value of each column with the methods `max`/`min`. Let's find the largest drop by calling the `min` method.

In [5]:
stocks.pct_change().min()

MSFT   -0.156201
AAPL   -0.517964
SLB    -0.184057
AMZN   -0.247661
TSLA   -0.193274
XOM    -0.139395
WMT    -0.101816
T      -0.126392
FB     -0.189609
V      -0.136295
dtype: float64

For the first part of this challenge, we aren't interested in the value of the largest percentage one-day drop, but the date that it happened. Since the date is in the index, we can use the lesser-known method called `idxmin` which returns the index of the minimum. An analogous `idxmax` method also exists.

In [6]:
stocks.pct_change().idxmin()

MSFT   2000-04-24
AAPL   2000-09-29
SLB    2008-10-15
AMZN   2001-07-24
TSLA   2012-01-13
XOM    2008-10-15
WMT    2018-02-20
T      2000-12-19
FB     2018-07-26
V      2008-10-15
dtype: datetime64[ns]

In general mathematical speak, this calculation is known as the [arg min or arg max][1].

[1]: https://en.m.wikipedia.org/wiki/Arg_max

### Extra challenge

Knowing the date of the largest drop is great, but it doesn't tell us what the value of the drop was. We need to return both the minimum and the date of that minimum. This is possible with help from the `agg` method which allows us to return any number of aggregations from our DataFrame. 

An aggregation is any function that returns a single value. Both `min` and `idxmin` return a single value and therefore are considered aggregations. The `agg` method works by accepting a list of aggregating functions where the functions are written as strings.

In [7]:
stocks.pct_change().agg(['idxmin', 'min'])

Unnamed: 0,MSFT,AAPL,SLB,AMZN,TSLA,XOM,WMT,T,FB,V
idxmin,2000-04-24 00:00:00,2000-09-29 00:00:00,2008-10-15 00:00:00,2001-07-24 00:00:00,2012-01-13 00:00:00,2008-10-15 00:00:00,2018-02-20 00:00:00,2000-12-19 00:00:00,2018-07-26 00:00:00,2008-10-15 00:00:00
min,-0.156201,-0.517964,-0.184057,-0.247661,-0.193274,-0.139395,-0.101816,-0.126392,-0.189609,-0.136295


# Become a pandas expert

If you are looking to completely master the pandas library and become a trusted expert for doing data science work, check out my book [Master Data Analysis with Python][1]. It comes with over 300 exercises with detailed solutions covering the pandas library in-depth.

[1]: https://www.dunderdata.com/master-data-analysis-with-python