# Read the Data from Yahoo Fiance Remotely For One Stock

Using package `pandas_datareader` to read data remotely: <sup>[[1]](#ft1)</sup>
+ use `pip install pandas_datareader` to install `pandas_datareader` in shell

In [None]:
import pandas as pd
# REF: https://pandas.pydata.org/pandas-docs/version/0.18.1/remote_data.html#remote-data-yahoo
# Install pandas_datareader first.
import pandas_datareader.data as web
import datetime

start = datetime.datetime(1970,1,1)
end = datetime.date.today()
# read data from Yahoo Fiance Remotely
BHP = web.DataReader("BHP", "yahoo", start, end)

In [None]:
type(BHP)

# How to Obtain the Company List of an Industry

1. Website: https://finance.yahoo.com/sector/ms_basic_materials
2. Select the filter
3. Copy the list to txt, and get the company name via regex

In [9]:
import re
from collections import defaultdict
import pandas as pd
import pandas_datareader.data as web
import datetime
import os

def getIndustryData(path: str, start: datetime.datetime, end: datetime.datetime):
    # read the txt file
    with open(path, 'r') as f:
        f = f.readlines()
    # identify the symbol pattern of the company
    pattern = re.compile('(?<=^)[A-Z-]+(?=\t)')
    
    company_list = []
    # get the company list
    for line in f:
        company_list.extend(re.findall(pattern, line)) 
        
    company_dict = defaultdict(pd.core.frame.DataFrame)
    for c in company_list:
        # read data from Yahoo Fiance Remotely
        try:
            company_dict[c] = web.DataReader(c, "yahoo", start, end)
        except:
            print(c)
        
    return company_dict

def toCsv(d: dict, folder: str):
    # store the data into data folder
    # REF: https://www.jianshu.com/p/dde02a88a5c1
    root_path = os.path.abspath(os.path.dirname(os.getcwd()))
    data_path = os.path.join(root_path, 'data')
    path = os.path.join(data_path, folder)
    try:
        # REF: https://www.geeksforgeeks.org/create-a-directory-in-python/
        os.mkdir(path) # create the corresponding industry direction
    except:
        pass
    for k,v in d.items():
        file = path + '/' + k + '.csv' # file location
        v.to_csv(path_or_buf = file, index = True)

start = datetime.datetime(1970,1,1) # start time point 
end = datetime.datetime(2021,1,1) # end time point

## Agriculture

Filter:

|Attributes|Value|
|:--|:--|
|Prices| >0|
|Section|Basic Materials|
|Industry|Agricultural Inputs|
|Exchange|NasdaqGS<br> NYSE<br> Nasdaq<br> NasdaqCM<br> NasdaqGM<br>|

In [6]:
agri_dict = getIndustryData('./agriculture.txt', start, end)
# toCsv(agri_dict, 'agriculture')

## Energy

![](https://user-images.githubusercontent.com/22797017/115959053-e398e100-a53c-11eb-9161-db98a1c912d6.png)

The following companies do not have the data prior to 2021-01-01<sup>[[2]](#ft2)</sup>:
+ CSAN
+ CHK
+ DEN
+ XOG
+ VEI
+ AMR
+ GLP-PB

In [10]:
energy_dict = getIndustryData('./energy.txt', start, end)
# toCsv(energy_dict, 'energy')

CSAN
CHK
DEN
XOG
VEI
AMR
GLP-PB


## Transportation

![](https://user-images.githubusercontent.com/22797017/115959990-99fec500-a541-11eb-89cd-19ff40f4771e.png)

# Footnotes

<a name="ft1">[1]</a>: How to acess Yahoo Fiance data remotely? (Chinese Version) https://blog.csdn.net/Hellolijunshy/article/details/82527643

<a name="ft2">[2]</a>: Getting KeyError : 'Date' in Yahoo https://github.com/pydata/pandas-datareader/issues/640