Group 24: Yu-Heng Chi, Param Sejpal, Jessica Villanueva, Yihan Yang, Zhiyi Zhang
The focus of this project is to measure a company’s financial health and performance trajectory based on its predicted Market Capitalization. The companies we considered are Fortune1000 companies and the metrics are gathered from financial reports in 2024.
“Is this company financial healthy” can be answered by several factors. We performed regression predict market capitalization ("Market Cap") and we classified companies as healthy or non-healthy based on their status as a growth or non-growth company.
In this project, we use multiple regression models to predict a company's Market Cap using financial metrics including EBITDA, revenue, and other relevant features.
One measure of positive financial performance (described in Deliverable 1 as no bankruptcy or restructuring events, positive growth for certain financial metrics, and other factors) that tells us information about Market Cap is profitability. Therefore, we will classify these businesses as Profitable or Non-Profitable companies.
- Clone this repo
- Set up a virtual environment
pip install -r requirements.txt- (Optional) the dataset we use is included in this repo, but to see how we created the data, run the data creation script
generate.pyto create the dataset - Run the Jupyter notebook!
Many companies report EBITDA or earnings differently, which is why we created a dataset based on what we determined is the most consistent reflection of income statement and balance sheet data. FinancialData.csv is Financial data from Fortune1000 companies that was created using different sources of financial data. Income statement and balance sheet information was accessed from:
- A Kaggle dataset (k04dRunn3r on Kaggle).
- Yahoo! Finance financials.
- 10-K reports from the EDGAR archives on SEC.gov.
This directory includes the following files.
- FinancialData.csv: the final dataset we created and are using in this project.
generate.py: a quick script that can be run to see how the dataset was created.- Financial metrics that were added to the dataset come from Yahoo! Finance.
- KaggleData.csv: The Kaggle dataset with basic company metrics (ticker, revenue, etc.) used in creating the dataset.
- Past project deliverables: this is added for reference and contains notes that we can refer to throughout the project.
requirements.txt: installation requirements.
- Rank: company rank in the Fprtune1000 list.
- Ticker: The stock symbol associated with a company.
- Sector, Industry, Type: Economic categorization a company belongs to.
- Profitable: (EBITDA Profitability) Profitability of a company, i.e., if total income outweighs expenses.
- Revenue: Sales prior to any expenses.
- Market Cap: Size of the equity portion of the business.
- Gross Profit: Profit after deducting cost of goods sold.
- EBITDA: Earnings before Interest Taxes Depreciation & Amortization - A non-GAAP, capital structure neutral, accrual accounting measure of profitability. EBITDA margins represents EBITDA as a percentage of total revenue.
- Profits Percent Change and Revenue Percent Change: Growth percentage of current year's metric value from year before.
These metrics reflect how well the company is doing, potential sudden market events, and what investors think about a company's growth potential. More information on why these metrics were chosen and which machine learning techniques can be found under our Project Deliverable #1 and Project Deliverable #2 submissions (included in deliverables/).
