# Week 4 - Collecting data from Yahoo - use of yfinance

## What is API?

API stands for application programming interface, which is a set of definitions and protocols for building and integrating application software. Most APIs, require authorization before the service provider will give you an assess token that allowing you to call the API and pull the data. We will explore this in our next level of the exercise for fetching data from Twitter. In this basic level of the exercise, we will use a python library that integrating the authorization process which is very friendly to the beginners.

## About yfinance

Please visit the documentation of yfinance via this [link](https://pypi.org/project/yfinance/) and you should read the legal disclaimer before using the library. yfinance is a Python library that gives public an easy access to financial data available on [Yahoo Finance](https://uk.finance.yahoo.com/). One of the big advantages of this library is that it returns the data as Pandas DataFrames, which we introduced in the last week tutorial. In most of the cases we deal with APIs to pull the data, transforming data to an organised format is the one of the most difficult parts. The library of yfinance has helped us to organise it properly which is a plus for the beginner.

## The learning outcomes of this exercise

1. Using the public available library in Python;
2. Understanding the use of documentation of library to solve problem;
3. Collecting data and organise them properly in Python DataFrame;
4. Using the descriptive techniques to analyse the company's profiles.

<img src="https://github.com/okenta0524/20251025_data_analytics_week4/blob/main/Image/stocks-vs-market-wo.jpg?raw=1" width=400 height=400 />

In [1]:
#importing the required library in this exercise

import pandas as pd
import yfinance as yf

### 1. Retriving data based on company's Stock Ticker

Simply put, the Stock Ticker is the comapany's ID number enable the analyst to retrive data of the company. For lists of the Stock Ticker, you can visit this [link](https://www.nyse.com/listings_directory/stock) for the NYSE market and this [link](https://www.nasdaq.com/market-activity/stocks/screener) for the NASDQ. The `Ticker` module, which allows you to access company's data through their Stock Ticker.

With the library name `yf` we defined, we can use the module through typing `yf.Ticker("Insert The Ticker Symbol Here")`. I took Ford Morto as an example below:

In [2]:
FORD = yf.Ticker("F")

#utilising the .info function to retrive the background information of the company
FORD.info

{'address1': 'One American Road',
 'city': 'Dearborn',
 'state': 'MI',
 'zip': '48126',
 'country': 'United States',
 'phone': '313 322 3000',
 'website': 'https://www.ford.com',
 'industry': 'Auto Manufacturers',
 'industryKey': 'auto-manufacturers',
 'industryDisp': 'Auto Manufacturers',
 'sector': 'Consumer Cyclical',
 'sectorKey': 'consumer-cyclical',
 'sectorDisp': 'Consumer Cyclical',
 'longBusinessSummary': 'Ford Motor Company develops, delivers, and services Ford trucks, sport utility vehicles, commercial vans and cars, and Lincoln luxury vehicles worldwide. It operates through Ford Blue, Ford Model e, Ford Pro, and Ford Credit segments. The company sells Ford and Lincoln internal combustion engine and hybrid vehicles, electric vehicles, service parts, accessories, and digital services for retail customers, as well as develops software. It also sells Ford and Lincoln vehicles, service parts, and accessories through distributors and dealers, as well as through dealerships to com

### 2. Defining our observation period

In the sampling process, we always specify a time frame to make our data collection more efficiently. The function of `.history` will help us to specify the details and we can check the detailed statment of the following parameters:

        period : str
            Valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
            Either Use period parameter or use start and end
        interval : str
            Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
            Intraday data cannot extend last 60 days
        start: str
            Download start date string (YYYY-MM-DD) or _datetime.
            Default is 1900-01-01
        end: str
            Download end date string (YYYY-MM-DD) or _datetime.
            Default is now
        prepost : bool
            Include Pre and Post market data in results?
            Default is False
        auto_adjust: bool
            Adjust all OHLC automatically? Default is True
        back_adjust: bool
            Back-adjusted data to mimic true historical prices
        proxy: str
            Optional. Proxy server URL scheme. Default is None
        rounding: bool
            Round values to 2 decimal places?
            Optional. Default is False = precision suggested by Yahoo!
        tz: str
            Optional timezone locale for dates.
            (default data is returned as non-localized dates)
        timeout: None or float
            If not None stops waiting for a response after given number of
            seconds. (Can also be a fraction of a second e.g. 0.01)
            Default is None.
            
**We use default setting for nearly all the peremeters here, but we'd like to change the period or dates and the interval.**

In [4]:
#Downlaoding the most recent three months' daily stock market data

FORD = yf.Ticker('F')
FORD_Three_Months = FORD.history(period = '3mo', interval = '1d')
FORD_Three_Months

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-07-22 00:00:00-04:00,11.091060,11.140398,10.933180,11.041722,79403800,0.0,0.0
2025-07-23 00:00:00-04:00,11.130530,11.308145,11.130530,11.229205,76163200,0.0,0.0
2025-07-24 00:00:00-04:00,11.160133,11.278542,11.081192,11.110795,63557800,0.0,0.0
2025-07-25 00:00:00-04:00,11.189735,11.337747,11.091059,11.318012,52312300,0.0,0.0
2025-07-28 00:00:00-04:00,11.327879,11.337747,11.100927,11.130529,54173600,0.0,0.0
...,...,...,...,...,...,...,...
2025-10-15 00:00:00-04:00,11.680000,11.780000,11.640000,11.760000,88960700,0.0,0.0
2025-10-16 00:00:00-04:00,11.740000,11.790000,11.640000,11.740000,108701200,0.0,0.0
2025-10-17 00:00:00-04:00,11.740000,12.010000,11.720000,11.920000,107633500,0.0,0.0
2025-10-20 00:00:00-04:00,11.930000,12.080000,11.910000,11.990000,77878900,0.0,0.0


### 2.Q. Exercise of downloading stock price data and visualising it

**<span style="color:green">Now it's your turn to practice: Can you download the hourly stock price reaction (in one week) of Apple releasing iPhone 16/16 pro?</span>**

Hint: You will need to check the releasing date of iphone model and adjust the peremeter in  `history`. Please use the unvealing date of the models not the launching date for sales. You can use the library of `matplotlib`[Example of the use of matplotlib](https://matplotlib.org/stable/gallery/index.html) for the visualisation.

In [5]:
#Please show your code of downloading the data below
APPLE = yf.Ticker('AAPL')
APPLE_One_Week = APPLE.history(period = '1wk', interval = '1h')
APPLE_One_Week


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-10-15 09:30:00-04:00,249.380005,251.820007,248.639999,250.850006,8163531,0.0,0.0
2025-10-15 10:30:00-04:00,250.869995,251.469193,249.800003,249.859894,3430819,0.0,0.0
2025-10-15 11:30:00-04:00,249.820007,250.049698,248.720001,249.774994,2879510,0.0,0.0
2025-10-15 12:30:00-04:00,249.779999,249.949905,247.479996,248.0905,2290291,0.0,0.0
2025-10-15 13:30:00-04:00,248.104996,249.800003,248.050003,249.647003,1934212,0.0,0.0
2025-10-15 14:30:00-04:00,249.649994,250.100006,249.479996,249.919998,2139459,0.0,0.0
2025-10-15 15:30:00-04:00,249.929993,250.070007,248.169998,249.429993,4229173,0.0,0.0
2025-10-16 09:30:00-04:00,248.270004,248.380005,246.619995,247.509995,9564057,0.0,0.0
2025-10-16 10:30:00-04:00,247.490005,249.039993,247.481705,248.235001,4826932,0.0,0.0
2025-10-16 11:30:00-04:00,248.169998,248.663101,247.035004,247.720001,3305994,0.0,0.0


In [None]:
#Please complete the code to visualise the stock price reaction

import matplotlib.pyplot as plt


### 3. Extracting Fundamental Data

To conduct industrial research or academic study, the fundamental data (normally reported in annual report/earning reports) such as firm's revenue, cost of good sold, and R&D investement are heavily used.

The financial data can be extracted through the method of `.financials`.

In [None]:
FORD = yf.Ticker('F')
FORD_financials = FORD.financials
FORD_financials

### 3.Q. Exercise of downloading financial data for a single company

**<span style="color:green">Now it's your turn to practice: Can you download the balancesheet and cash flow data of FORD and integrated all three Dataframes including the Financials in last example?</span>**

Hint: To extract the data from balancesheet and cashflow, you will use the methods of `.balancesheet` and `.cashflow`. To merge data frame, you will use `.concat` in pandas. `.to_csv` is the method for exporting dataframe to csv file.

In [None]:
#Please show your code of integrating the financials, balancesheet, and cashflow below:



### 4. Dealing with multiple companies in data collection

To have a comprehensive picture of the industry/competitors, we always want to explore the patterns behind multiple companies. In the below example, we will create a list of company for the data collection.

In [None]:
tickerStrings = ['AMZN','NFLX']
df_list = list() # creating an empty list
for ticker in tickerStrings:
    data = yf.download(ticker, group_by="Ticker", period='2y', interval='1d')
    data['ticker'] = ticker  # add this column because the dataframe doesn't contain a column with the ticker
    df_list.append(data) # fitting the collected data to the list

# combine all dataframes into a single dataframe
df = pd.concat(df_list)

# save to csv
df.to_csv('demonstration.csv')

df.head()

In [None]:
# restructing the data as a pivot table for visualisation

df.pivot_table(index='Date',columns='ticker',values='Open').plot()

### 4.Q. Exercise of downloading data from multiple companies

**<span style="color:green">Now it's your turn to practice: Can you download the daily stock price data of FORD, TOYOTA, and TESLA in recent 5 years and visualising them?</span>**

In [None]:
# Please show your code below:

