# Data Collection for Investment Analysis
## Introduction

In this notebook, we collect historical market data for selected stocks using the yfinance library. The primary goal is to gather comprehensive datasets for analysis, visualization, and potential use in machine learning models. This project serves as an educational exercise to explore Python's capabilities in financial data analysis.
### Objectives:
``Download Historical Stock Data:`` Retrieve historical data for selected stocks.

``Store Data Efficiently:`` Save the collected data in a structured format for further analysis.

``Maintain Data Quality:`` Ensure data completeness and accuracy.

In [31]:
import yfinance as yf
import pandas as pd
from datetime import datetime

def download_data(ticker, start_date,end_date):
    start_date = datetime.strptime(start_date, '%Y-%m-%d')
    end_date = datetime.strptime(end_date, '%Y-%m-%d')
    start_date = start_date.strftime('%Y-%m-%d')
    end_date = end_date.strftime('%Y-%m-%d')
    if isinstance(ticker, list):
        for i in ticker:
            data = yf.download(i, start = start_date, end = end_date)
            # data.to_csv(f"../data/raw/market data/{i}_{datetime.now().strftime('%Y-%m-%d')}.csv")
            data.to_csv(f"../data/raw/market data/{i}.csv")
    else:
        data = yf.download(ticker, start = start_date, end = end_date)
        # data.to_csv(f"../data/raw/market data/{ticker}_{datetime.now().strftime('%Y-%m-%d')}.csv")
        data.to_csv(f"../data/raw/market data/{i}.csv")


def main():
    ticker= ["AAPL","META","ABG.JO","MSFT"]
    start_date = '2000-01-01'
    end_date = datetime.today().strftime('%Y-%m-%d')
    download_data(ticker,start_date,end_date)


if __name__ == '__main__':
    main()

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


### Data Storage

The data is saved in the ../data/raw/market data/ directory, with filenames including the ticker symbol. This format helps in managing and eliminated repetitive versioning of standardised data.

__Future Enhancements__

``Data Quality Checks:`` Implement methods to verify the integrity and completeness of the data.

``Database Integration:`` Consider storing the data in a database for efficient querying and management.

``Automation:`` Automate the data collection process using scheduling tools like GitHub Actions.

### Conclusion

This notebook lays the foundation for the broader investment analysis project. By systematically collecting and storing data, we set the stage for in-depth analysis and machine learning applications. As the project evolves, we will expand the scope to include more data sources and advanced analytical techniques.

Note: This project is a work in progress, and the current implementation may be refined over time.