<a href="https://colab.research.google.com/github/shanksghub/Minute-by-Minute-Price-prediction/blob/master/Scraping%20explained%20To%20be%20Sent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Libraries We Need

we will need the package pandas_datareader.data.

Our program will take in a start and end date as well as a list of tickers. 
The program will download the open, high, low, close, volume, and adjusted close prices of each stock for the specified time period.
We will also pass in a parameter that will allow us to download all available data for each stock in our list of tickers.
Yahoo Finance sometimes blocks IP addresses that request too much data (for instance if we passed in a list of 500 tickers) so we will create some
conditions that pause our program automatically so it looks like we are not requesting too much data. Let's get started!

Implementation

We first need to import four libraries: pandas_datareader.data, os, datetime, and time.
We use the pandas_datareader to use pandas functionality to pull data from yahoo finance. 
We will use the os module to create a directory for our stock data if it does not already exist.
We use the datetime module to obtain the current date if the user wants to download all data, and we use the time module to pause our python program so that we do not send too many data pulling requests to yahoo finance. 
Let us first create a python file called data_scraper.py and import the four libraries as follows:

In [0]:
import pandas_datareader.data as web
import os
import datetime
import time



We now will create our function called download_data. 
It will take in a list of tickers, a start date, an end date, and a parameter called all_data which is optional to download all historical data for the specified ticker.
Inside our function we need to create a count variable and set it to 1.
Each time we download data for a stock we will update the count variable and when our count variable equals 50 it means we have made 50 requests. 
We will then let our python program sleep for 10 seconds before we pull more data. 
This is just a safety net so that yahoo finance does not restrict our access to data. 
We have an all_data variable that is set to false originally. 
If the user specifies that all_data=True then we will use the datetime module to obtain the current date and then we will reformat the datetime variable into a string that has the format month-day-year, 
which will be stored in the end variable. We will let our start be January 1, 1970 as yahoo finance supports dates that date back this far. 
The nice thing about our scraper is that say a Stock started trading on 02-05-1984, 
we can let our start variable be 01-01-1970 and our data will be pulled from the date the stock started trading.
Our program implementation so far is as follows:

In [0]:
def download_data(tickers,start,end,all_data=False):
    count = 1
    if all_data==True:
        end = datetime.datetime.now()
        end = '%s-%s-%s' % (end.month,end.day,end.year)
        start = '01-01-1970'

    directory = 'stock_data'
    if not os.path.exists(directory):
        os.makedirs(directory)

    d = {}
    for ticker in tickers:
        filename = directory+'/'+ticker+'.csv'
        d[ticker] = web.DataReader(ticker,"yahoo",start,end)
        d[ticker].to_csv(filename)
        
        count  = count + 1
        if count % 50 == 0:
            time.sleep(10)
    return

We now need to update our count variable for each stock we scrape data from. 
To do this we set count = count + 1. 
We then check if count % 50 == 0 and if it is we use the time module to sleep for 10 seconds. 
This will ensure that every 50 stocks scraped we wait so that yahoo finance does not block us. 
For instance, say we passed in a list of 500 tickers. 
Then, after the first 50 tickers we would sleep for 10 seconds, after 50 more stocks we would sleep for 10 seconds, and so on. 
This is the final piece of code for our function. Our completed function looks as follows:


We now need to run our function. To do so we will create a list of tickers, start,
and end date and pass these in to our download_data function. 
An example of doing so is as follows:


In [0]:
if __name__ == '__main__':
    tickers = ['AAPL','BAC','GILD','MSFT']
    start = '2016-01-01'
    end = '2016-12-21'
    download_data(tickers,start,end,all_data=True)



We can run the above code and a directory called stock_data will contain AAPL.csv, BAC.csv, 
GILD.csv, and MSFT.csv for the specified start and end dates. 
With all_data=True, all data for the stocks can be added in the in our function call as above.