<a href="https://colab.research.google.com/github/saadkhalidabbasi/EDA-Projects/blob/main/China_vs_Japan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **China vs Japan**

This dataset offers a detailed comparison of technological advancements in China and Japan, covering key sectors such as artificial intelligence, robotics, telecommunications, and clean energy. Whether you're a researcher, student, or enthusiast, this dataset provides a valuable resource to explore the technological trajectories of these two leading global economies. 📊

The dataset contains multiple indicators such as R&D spending, the number of patents filed, internet penetration rates, and much more. Dive into the world of cutting-edge technologies and explore the rivalry between China and Japan in innovation and progress. 🇨🇳 vs 🇯🇵

## **Importing Libraries**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px


## **Importing Dataset**

In [2]:
df=pd.read_csv('Big_Japan_vs_China_Technology.csv')

In [3]:
df.head()

Unnamed: 0,Country,Year,Tech Sector,Market Share (%),R&D Investment (in USD),Number of Patents Filed (Annual),Number of Tech Companies,Tech Exports (in USD),Number of Startups,Venture Capital Funding (in USD),Global Innovation Ranking,Internet Penetration (%),5G Network Coverage (%),University Research Collaborations,Top Tech Products Exported,Number of Tech Workers
0,China,2001,Software,22.279014,83772080000.0,1415,878,35031550000.0,166,45026940000.0,11,57.088673,82.240272,50,Robots,621221
1,Japan,2011,Semiconductor,31.899013,35511340000.0,7899,364,37142090000.0,217,11473810000.0,14,78.17209,48.552982,134,5G Equipment,431928
2,Japan,2009,Robotics,33.574466,84809480000.0,3749,425,157040600000.0,451,5498885000.0,14,55.810668,66.495286,58,Semiconductors,55776
3,Japan,2019,Cloud Computing,24.904248,22678210000.0,3841,62,103128400000.0,264,21862780000.0,10,78.553714,28.807251,150,Robots,267852
4,China,2002,Robotics,46.975827,34536550000.0,1704,458,111205600000.0,463,40982820000.0,10,70.427548,38.746268,74,AI Chips,654162


## **Data Wrangling**

In [11]:
df.shape

(1000, 16)

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   Country                             1000 non-null   object 
 1   Year                                1000 non-null   int64  
 2   Tech Sector                         1000 non-null   object 
 3   Market Share (%)                    1000 non-null   float64
 4   R&D Investment (in USD)             1000 non-null   float64
 5   Number of Patents Filed (Annual)    1000 non-null   int64  
 6   Number of Tech Companies            1000 non-null   int64  
 7   Tech Exports (in USD)               1000 non-null   float64
 8   Number of Startups                  1000 non-null   int64  
 9   Venture Capital Funding (in USD)    1000 non-null   float64
 10  Global Innovation Ranking           1000 non-null   int64  
 11  Internet Penetration (%)            1000 non

In [13]:
df.describe()

Unnamed: 0,Year,Market Share (%),R&D Investment (in USD),Number of Patents Filed (Annual),Number of Tech Companies,Tech Exports (in USD),Number of Startups,Venture Capital Funding (in USD),Global Innovation Ranking,Internet Penetration (%),5G Network Coverage (%),University Research Collaborations,Number of Tech Workers
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,2011.497,27.439221,50345910000.0,5094.686,1011.133,99052790000.0,257.524,24672340000.0,10.003,69.739498,49.970784,99.084,496483.411
std,6.991486,12.808159,28297040000.0,2846.164392,563.223321,59211770000.0,143.525939,14693300000.0,5.533318,17.071675,23.012153,55.374319,290405.281594
min,2000.0,5.033959,1056575000.0,119.0,52.0,1072723000.0,11.0,36814490.0,1.0,40.07913,10.043427,1.0,2422.0
25%,2005.0,16.333641,25582910000.0,2789.5,503.0,46052660000.0,131.0,11953720000.0,5.0,54.879624,29.796081,53.0,248962.5
50%,2012.0,27.111042,50100750000.0,5021.0,998.5,96710710000.0,257.5,24538260000.0,10.0,70.257998,51.150191,100.0,505236.0
75%,2017.25,38.480568,74928960000.0,7569.5,1494.25,152049100000.0,382.25,37115080000.0,15.0,84.408627,69.795189,147.0,737879.25
max,2023.0,49.955344,99975370000.0,9989.0,1999.0,199728000000.0,499.0,49917080000.0,19.0,99.935228,89.90458,199.0,996999.0


In [14]:
df.isnull().sum()

Unnamed: 0,0
Country,0
Year,0
Tech Sector,0
Market Share (%),0
R&D Investment (in USD),0
Number of Patents Filed (Annual),0
Number of Tech Companies,0
Tech Exports (in USD),0
Number of Startups,0
Venture Capital Funding (in USD),0


In [15]:
df.duplicated().sum()

0

In [16]:
df.drop_duplicates(inplace=True)

In [17]:
df.columns

Index(['Country', 'Year', 'Tech Sector', 'Market Share (%)',
       'R&D Investment (in USD)', 'Number of Patents Filed (Annual)',
       'Number of Tech Companies', 'Tech Exports (in USD)',
       'Number of Startups', 'Venture Capital Funding (in USD)',
       'Global Innovation Ranking', 'Internet Penetration (%)',
       '5G Network Coverage (%)', 'University Research Collaborations',
       'Top Tech Products Exported', 'Number of Tech Workers'],
      dtype='object')

In [19]:
df["Year"].nunique()

24

## **Exploratory Data Analysis**

In [27]:
fig=px.pie(df,names='Country',hole=0.5)
fig.show()

In [29]:
fig = px.pie(df, names="Country", values="Market Share (%)", color="Country", hole=0.5)
fig.show()

In [31]:
fig = px.bar(df, x="Year", y="Market Share (%)", color="Country", barmode="group",
             title="Market Share Trend by Country")
fig.show()

In [32]:
fig = px.line(df, x="Year", y="Market Share (%)", color="Country",
              title="Market Share Trend for each Country")
fig.show()

In [33]:
fig = px.box(df, x="Country", y="Market Share (%)", title="Market Share Distribution by Country")
fig.show()

In [34]:
fig = px.violin(df, x="Country", y="Market Share (%)", title="Market Share Density by Country")
fig.show()

In [37]:
fig = px.scatter(df, x="Market Share (%)", y="Venture Capital Funding (in USD)",
                 color="Country", title="Market Share vs. Venture Capital Funding")
fig.show()
