# Top 1000 Technology Companies Data Analysis Project

This notebook performs extensive data cleaning, analysis, and visualization on the dataset of the top 1000 technology companies.


# Project Brief

This project involves analyzing the dataset of the top 1000 technology companies based on their market capitalization. The goal is to clean the data, perform exploratory analysis, and generate insights that can be used for decision-making in the technology sector.
        

## Load and Explore the Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

file_path = '/mnt/data/Top 1000 technology companies.csv'
tech_companies_data = pd.read_csv(file_path)
tech_companies_data.head()

## Data Cleaning
Remove any missing values, clean the 'Market Cap' column, and sort the data.

In [None]:
# Clean the 'Market Cap' column
tech_companies_data['Market Cap'] = tech_companies_data['Market Cap'].replace(
    {'\$': '', 'T': 'e12', 'B': 'e9', 'M': 'e6'}, regex=True
)
tech_companies_data['Market Cap'] = tech_companies_data['Market Cap'].str.replace(' ', '').astype(float)

# Sort the data by 'Ranking'
sorted_tech_companies_data = tech_companies_data.sort_values(by='Ranking')
sorted_tech_companies_data.head()

## Summary of Findings


After cleaning the data and performing exploratory analysis, several key insights were uncovered:

- **Apple Inc.** is the largest technology company by market capitalization, followed closely by **Microsoft Corporation** and **Nvidia Corporation**.
- The majority of the top companies are based in the **United States**, highlighting the dominance of U.S. companies in the global tech market.
- The **Semiconductors** industry is particularly well-represented in the top ranks, reflecting the critical role of semiconductor companies in the tech sector.
        

## Recommendations


Based on the analysis, the following recommendations are suggested:

1. **Investment Opportunities**: Investors may want to consider the semiconductor industry due to its significant representation among the top companies.
2. **Geographic Expansion**: Non-U.S. companies could explore opportunities to increase their presence in the U.S. market.
3. **Market Research**: Conduct further market research to understand the growth drivers behind the leading companies in the technology sector.
        

## Extensive Data Analysis and Visualizations

In [None]:
# Summary Statistics
summary_stats = sorted_tech_companies_data.describe()
summary_stats

In [None]:
# Distribution of Market Cap
plt.figure(figsize=(10, 6))
sns.histplot(sorted_tech_companies_data['Market Cap'], bins=30, kde=True)
plt.title('Distribution of Market Cap')
plt.xlabel('Market Cap (in $)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

In [None]:
# Market Cap by Country
plt.figure(figsize=(14, 7))
country_market_cap = sorted_tech_companies_data.groupby('Country')['Market Cap'].sum().sort_values(ascending=False).head(20)
sns.barplot(x=country_market_cap.values, y=country_market_cap.index, palette='coolwarm')
plt.title('Top 20 Countries by Total Market Cap')
plt.xlabel('Total Market Cap (in $)')
plt.ylabel('Country')
plt.grid(True)
plt.show()

In [None]:
# Market Cap by Sector
plt.figure(figsize=(14, 7))
sector_market_cap = sorted_tech_companies_data.groupby('Sector')['Market Cap'].sum().sort_values(ascending=False)
sns.barplot(x=sector_market_cap.values, y=sector_market_cap.index, palette='viridis')
plt.title('Market Cap by Sector')
plt.xlabel('Total Market Cap (in $)')
plt.ylabel('Sector')
plt.grid(True)
plt.show()

In [None]:
# Top 10 Companies by Market Cap
top_10_companies = sorted_tech_companies_data.head(10)
plt.figure(figsize=(12, 6))
sns.barplot(x='Market Cap', y='Company', data=top_10_companies, palette='magma')
plt.title('Top 10 Technology Companies by Market Capitalization')
plt.xlabel('Market Cap (in trillions)')
plt.ylabel('Company')
plt.grid(True)
plt.show()

In [None]:
# Sector distribution
plt.figure(figsize=(10, 6))
sns.countplot(y='Sector', data=sorted_tech_companies_data, order = sorted_tech_companies_data['Sector'].value_counts().index, palette='cubehelix')
plt.title('Distribution of Companies by Sector')
plt.xlabel('Number of Companies')
plt.ylabel('Sector')
plt.grid(True)
plt.show()

In [None]:
# Industry distribution
plt.figure(figsize=(10, 6))
sns.countplot(y='Industry', data=sorted_tech_companies_data, order = sorted_tech_companies_data['Industry'].value_counts().index[:15], palette='crest')
plt.title('Top 15 Industries by Number of Companies')
plt.xlabel('Number of Companies')
plt.ylabel('Industry')
plt.grid(True)
plt.show()

In [None]:
# Correlation matrix
correlation = sorted_tech_companies_data[['Ranking', 'Market Cap']].corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix')
plt.show()

## Save Cleaned Data
Finally, save the cleaned dataset for further use.

In [None]:
cleaned_excel_path = '/mnt/data/cleaned_tech_companies_data.xlsx'
sorted_tech_companies_data.to_excel(cleaned_excel_path, index=False)
print(f'Cleaned data saved to: {cleaned_excel_path}')