# Analyzing stock prices

For this project, I'll analyse stock market data downloaded from [Yahoo Finance](https://finance.yahoo.com/) using the [yahoo_finance](https://pypi.python.org/pypi/yahoo-finance) Python package. 

This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred stock symbols traded on the NASDAQ stock exchange, stored in the prices folder.

Each file in the prices folder has a specific stock symbol for its name, and each contains the following information:

- date — the data's date
- close — the date's closing price
- open — the date's opening price
- high — the date's highest stock price during trading
- low — the date's lowest stock price during trading
- volume — the date's number of shares traded

## Reading in the data

As the data is stored in one csv file per stock, we will need to iterate over all files. We'll be storing them in a dictionary with the stock symbols as the keys and the dataframe of each csv file as the values.

In [1]:
import pandas as pd
import os

In [54]:
stocks = {}
for fn in os.listdir("prices"):
    df = pd.read_csv(os.path.join("prices", fn))
    stocks[fn[:len(fn) - 4]] = df

In [55]:
stocks['aapl']

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.770000,86.199997,84.400002,208685400
3,2007-01-08,85.470000,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.150000,837324600
...,...,...,...,...,...,...
2585,2017-04-10,143.169998,143.600006,143.880005,142.899994,18473000
2586,2017-04-11,141.630005,142.940002,143.350006,140.059998,30275300
2587,2017-04-12,141.800003,141.600006,142.149994,141.009995,20238900
2588,2017-04-13,141.050003,141.910004,142.380005,141.050003,17652900


### Computing aggregates

In [56]:
ave_close = {}
for s in stocks:
    ave_close[s] = stocks[s]['close'].mean()    

In [57]:
# stock with minimum average closing price
min_ave_close = min(ave_close, key=ave_close.get)
print(min_ave_close, ave_close[min_ave_close])

blfs 0.8122763011583011


In [58]:
# stock with maximum average closing price
max_ave_close = max(ave_close, key=ave_close.get)
print(max_ave_close, ave_close[max_ave_close])

amzn 275.13407757104255


## Organising trades by date

In [59]:
len(stocks)

560

In [None]:
trade_dates = {}
for s in stocks:
    for index, row in stocks[s].iterrows():
        day = row['date']
        vol = row['volume']
        pair = (vol, s)
        if day not in trade_dates:
            trade_dates[day] = []
        trade_dates[day].append(pair)


## Finding the most-traded stock each day

Calculating a dictionary where the keys are the days and the values are a tuple (trade_vol, stock_symbol). The stock_symbol represents the most-traded stock symbol on that day, and trade_vol represents the trade volume (stored in the volume column).

In [None]:
most_traded_by_day = {}

for day in trade_dates:
    trade_dates[day].sort()
    most_traded_by_day[day] = trade_dates[day][-1]

In [None]:
print(list(most_traded_by_day.items())[0:2])

## Finding days with unusually high trading volumes

To do this, I'll create a dictionary where the keys are the dates and the values are the total volume of that day (over all stocks).

In [None]:
trade_dates['2007-01-03']

In [None]:
total_trades = {}
for day in trade_dates:
    day_vol = 0
    for vol, stock in trade_dates[day]:
        day_vol += vol  
    total_trades[day_vol] = day
    

In [None]:
total_list = list(total_trades.items())
total_list.sort(reverse=True)

#### 10 days with highest trading volumes

In [None]:
total_list[:10]

## Finding the most profitable stocks

I'll do this by:

- Subtracting the initial close price (first row) from the final close price (last row), then computing a percentage relative to the initial price. This tells us how much our initial investment would have grown or reduced.
- Sorting all of the percentages.
- Finding the 10 stocks that grew the most in the time period.

In [None]:
price_change = []
for s in stocks:
    initial_price = stocks[s].iloc[0,1]
    final_price = stocks[s].iloc[-1,1]
    change = (final_price - initial_price) / initial_price * 100
    price_change.append((round(change,2), s))

In [None]:
price_change.sort(reverse=True)
price_change[:10]

In [None]:
stocks['admp'][0]

In [None]:
stocks['admp'][-1]

The most profitable stock to buy would have been ADMP, which appreciated from around 7 cents to its current price of 4.43.

Possible next steps:
    
- Which stocks would have been the best to short-sell at the start of the period?
- Which stocks have the most after-hours trading, and show the biggest changes between the closing price and the next day open?
- Can technical indicators like Bollinger Bands help us forecast the market?
- Which time periods have resulted in steady increases in prices, and which periods have resulted in steady declines?
- Based on price, what was the optimal day to buy each stock if we wanted to hold them until now?
- On days with high trading volume, do stocks move in one direction (up or down) more than the other one?