### TITLE: THE GROWTH OF ARTIFICIAL INTELLIGENCE Through Private Investments: Forecasting the Next Years of Investment Trends in AI
AUTHOR: IMAN MOHAMED

GEORIGA STATE UNIVERSITY, INSTITUTE FOR INSIGHT, IMOHAMED7@STUDENT.GSU.EDU

### DATA SOURCE

**Link to Data Card**

*   https://drive.google.com/drive/folders/1ma9WZJzKreS8f2It1rMy_KkkbX6XwDOK


**Source**

*   https://ourworldindata.org/artificial-intelligence


**Underlying Data Sources**

Charlie Giattino, Edouard Mathieu, Veronika Samborska and Max Roser (2023) - “Artificial Intelligence” Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/artificial-intelligence' [Online Resource]

### Libraries

In [None]:
!pip install plotly


# Import the pandas library and alias it as 'pd' for data manipulation
import pandas as pd

# Import the NumPy library and alias it as 'np' for numerical operations
import numpy as np

# Import the seaborn library for data visualization
import seaborn as sns

# Import the norm function from scipy.stats for scientific computing
from scipy.stats import norm

# Import the pyplot module from matplotlib and alias it as 'plt' for plotting
import matplotlib.pyplot as plt

# Enable inline plotting for Jupyter Notebook
%matplotlib inline

# Import the pathlib module to work with file paths
from pathlib import Path

# Import plotly graph objects as 'go' and plotly express as 'px' for interactive plotting
import plotly.graph_objs as go
import plotly.express as px




In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### UPLOADING & CLEANING THE DATA

In [None]:
df_Totals = pd.read_csv("/content/drive/MyDrive/Internal_Sprint/private-investment-in-AI-focus-area.csv")
print(f"Number of records: {df_Totals.shape[0]:,}\nNumber of columns: {df_Totals.shape[1]:,}")

Number of records: 160
Number of columns: 7


In [None]:
df_Totals.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Entity,160.0,26.0,Total,10.0,,,,,,,
Code,0.0,,,,,,,,,,
Year,160.0,,,,2019.375,1.872509,2013.0,2018.0,2019.0,2021.0,2022.0
World,160.0,,,,4774577575.1375,14142466531.992706,14814431.0,636550857.0,1497969872.0,3361576440.75,125356874235.0
European Union and United Kingdom,160.0,,,,469283714.8875,1459439592.717097,0.0,32701037.5,94379972.0,259983859.0,12500782453.0
China,160.0,,,,1106502952.70625,3012094840.977515,0.0,42766928.25,234287057.0,846222902.25,22848295305.0
United States,160.0,,,,2637807508.28125,7897087798.07303,0.0,282865055.0,663885885.0,1797540978.5,73395940859.0


In [None]:
# display datatype for column/variable names
df_Totals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160 entries, 0 to 159
Data columns (total 7 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Entity                             160 non-null    object 
 1   Code                               0 non-null      float64
 2   Year                               160 non-null    int64  
 3   World                              160 non-null    int64  
 4   European Union and United Kingdom  160 non-null    int64  
 5   China                              160 non-null    int64  
 6   United States                      160 non-null    int64  
dtypes: float64(1), int64(5), object(1)
memory usage: 8.9+ KB


In [None]:
# review first 10 records in the data and print them
print("\nFirst 10 records in the data are:")
df_Totals.head(10)


First 10 records in the data are:


Unnamed: 0,Entity,Code,Year,World,European Union and United Kingdom,China,United States
0,AI ventures,,2017,705626292,34720055,233168581,404163765
1,AI ventures,,2018,1539250299,30321937,591347599,857474812
2,AI ventures,,2019,1212544984,15388467,93042361,962517568
3,AI ventures,,2020,1545821073,5170265,91954563,1358836062
4,AI ventures,,2021,2379370717,1013359242,258977775,923024967
5,AI ventures,,2022,1242838005,14814431,451988999,641729703
6,Agricultural technology,,2017,309478503,7554717,10391318,261846503
7,Agricultural technology,,2018,280024054,33194058,0,194254290
8,Agricultural technology,,2019,502672339,116117865,18018197,283437548
9,Agricultural technology,,2020,686735808,25459779,7959031,624487776


In [None]:
# rename the columns
df_Totals.rename(columns={'European Union and United Kingdom':'EU & UK', 'United States':'USA','Entity':'Private Sector'},inplace=True)

In [None]:
df_Totals = df_Totals.drop('Code', axis=1)

In [None]:
df_Totals.head(10)

Unnamed: 0,Private Sector,Year,World,EU & UK,China,USA
0,AI ventures,2017,705626292,34720055,233168581,404163765
1,AI ventures,2018,1539250299,30321937,591347599,857474812
2,AI ventures,2019,1212544984,15388467,93042361,962517568
3,AI ventures,2020,1545821073,5170265,91954563,1358836062
4,AI ventures,2021,2379370717,1013359242,258977775,923024967
5,AI ventures,2022,1242838005,14814431,451988999,641729703
6,Agricultural technology,2017,309478503,7554717,10391318,261846503
7,Agricultural technology,2018,280024054,33194058,0,194254290
8,Agricultural technology,2019,502672339,116117865,18018197,283437548
9,Agricultural technology,2020,686735808,25459779,7959031,624487776


### HANDLING MISSING VALUES

In [None]:
# Count the number of missing values in each column
print("The length of rows:", len(df_Totals))
print("Missing World:", df_Totals["World"].isna().sum()),
print("Missing EU & UK:", df_Totals["EU & UK"].isna().sum()),
print("Missing China:", df_Totals["China"].isna().sum()),
print("Missing USA:", df_Totals["USA"].isna().sum())

The length of rows: 160
Missing World: 0
Missing EU & UK: 0
Missing China: 0
Missing USA: 0


In [None]:
# Looking at the data on Excel, missing data are visibly seen.
# I will fill in the missing values of the selected privated sector through the average amount of that sector

# average_EU_UK = df_PI['EU & UK'].mean()
# average_China = df_PI['China'].mean()
# average_World = df_PI['World'].mean()  # Fix the missing quote here
# average_USA = df_PI['USA'].mean()

# print('EU & UK average:', average_EU_UK)
# print('China average:', average_China)
# print('USA average:', average_USA)
# print('World average:', average_World)

In [None]:
# # Replace missing values with the calculated averages
# df_PI['EU & UK'].replace(0, average_EU_UK, inplace=True)
# df_PI['China'].replace(0, average_China, inplace=True)
# df_PI['USA'].replace(0, average_USA, inplace=True)
# df_PI['World'].replace(0, average_World, inplace=True)

### UNDERSTANDING THE DATA

In [None]:
unique_private_sectors = df_Totals['Private Sector'].unique()

for sector in unique_private_sectors:
    print(sector)

AI ventures
Agricultural technology
Augmented or virtual reality
Cybersecurity
Data management
Drones
Educational technology
Energy, oil and gas
Entertainment
Facial recognition
Financial technology
Fitness and wellness
Geospatial
Human Resources technology
Industrial automation
Insurance technology
Legal technology
Marketing and digital ads
Medical and healthcare
Music and video content
Natural Language Processing, customer support
Retail
Sales enablement
Semiconductors
Total
Venture capital


In [None]:
# Filter the data for years 2013-2022 and exclude 'Total' from unique private sectors
filtered_data = df_Totals[(df_Totals['Year'].between(2013, 2022)) & (df_Totals['Private Sector'] != 'Total')]

# Group by 'Private Sector' and calculate the sum of values for 'World', 'EU & UK', 'China', and 'USA'
sector_sum = filtered_data.groupby('Private Sector').agg({'World': 'sum', 'EU & UK': 'sum', 'China': 'sum', 'USA': 'sum'})

# Sort sectors based on the sum of values across 'World', 'EU & UK', 'China', and 'USA'
sorted_sectors = sector_sum.sum(axis=1).sort_values(ascending=False)

# Select the top sectors, e.g., top 10
top_sectors = sorted_sectors.index[:10]

print(top_sectors)

Index(['Data management', 'Medical and healthcare', 'Financial technology',
       'Cybersecurity', 'Sales enablement', 'Retail', 'Educational technology',
       'Music and video content', 'Industrial automation',
       'Insurance technology'],
      dtype='object', name='Private Sector')


In [None]:
# Select the top sectors, e.g., top 3
top3_sectors = sorted_sectors.index[:3]
print(top3_sectors)

Index(['Data management', 'Medical and healthcare', 'Financial technology'], dtype='object', name='Private Sector')


In [None]:
print(sector_sum)

                                                     World     EU & UK  \
Private Sector                                                           
AI ventures                                     8625451370  1113774397   
Agricultural technology                         3940226567   728953047   
Augmented or virtual reality                    6060971792   162584053   
Cybersecurity                                  23514422205  1201259525   
Data management                                37410374246  2471204517   
Drones                                          3209360405   120661749   
Educational technology                         17088651756   290704608   
Energy, oil and gas                            11837810373   594160930   
Entertainment                                   8172353545   572378004   
Facial recognition                              2424419243   201900631   
Financial technology                           29960594192  2837794128   
Fitness and wellness                  

### OVERVIEW


In [None]:
#'Total' Graphed

# Filter the data for "Total" within the private sector
total_data = df_Totals[df_Totals['Private Sector'] == 'Total']

# Create the first graph for "Total" data
fig1 = px.line(total_data, x='Year', y=['World', 'EU & UK', 'China', 'USA'], line_dash='Private Sector', markers=True)

# Customize the plot layout
fig1.update_layout(
    title='Annual Private Investiment Totals in AI from 2013-2022',
    xaxis_title='Year',
    yaxis_title='Value',
)

# Show the first plot
fig1.show()

### A CLOSER LOOK: TOP SECTORS GRAPHED

In [None]:
#Same dataset uploaded excluding the "Total" rows from years 2017-2022 (to rule out duplicates within the data)

df_PI = pd.read_csv("/content/drive/MyDrive/Internal_Sprint/'17-22 Total excluded_private-investment-in-artificial-intelligence-by-focus-area.csv")
print(f"Number of records: {df_PI.shape[0]:,}\nNumber of columns: {df_PI.shape[1]:,}")

Number of records: 154
Number of columns: 7


In [None]:
# rename the columns
df_PI.rename(columns={'European Union and United Kingdom':'EU & UK', 'United States':'USA','Entity':'Private Sector'},inplace=True)

In [None]:
df_PI = df_PI.drop('Code', axis=1)
df_PI.head(2)

Unnamed: 0,Private Sector,Year,World,EU & UK,China,USA
0,AI ventures,2017,705626292,34720055,233168581,404163765
1,AI ventures,2018,1539250299,30321937,591347599,857474812


In [None]:
# Graphed: The top 3 sectors where Private Investments were made in the variables of "USA", "China", "EU & UK"
# "World" is excluded because it includes the totals from the other variables (USA, China, EU & UK) and other world regions not mentioned in the study

fig = px.line(filtered_data[filtered_data['Private Sector'].isin(top3_sectors)],
              x='Year',
              y=['EU & UK', 'China', 'USA'],
              markers=True,
              facet_col='Private Sector',  # Create separate graphs for each Private Sector
              facet_col_wrap=2,  # Number of columns in the facet grid
              color_discrete_map={
                  'World': 'red',  # Choose your desired color for 'World'
                  'EU & UK': 'blue',  # Choose your desired color for 'EU & UK'
                  'China': 'green',  # Choose your desired color for 'China'
                  'USA': 'purple'  # Choose your desired color for 'USA'
              },
              color_discrete_sequence=['red', 'blue', 'green', 'purple']  # Assign a sequence of colors to the lines

              )

# Customize the plot
fig.update_layout(
    title='Private Sector Data Over Time',
    xaxis_title='Year',
    yaxis_title='Value',
)

# Show the plot
fig.show()

### USA v. China: Top 10 Sectors

In [None]:
# Filter the data for the top 10 sectors
top_sectors_data = filtered_data[filtered_data['Private Sector'].isin(top_sectors)]

# Calculate the average for each country (USA and China) across all top 10 sectors
average_by_country = top_sectors_data.groupby('Private Sector').agg({'China': 'mean', 'USA': 'mean'})

print(average_by_country)


                                China           USA
Private Sector                                     
Cybersecurity            9.868879e+08  2.289687e+09
Data management          1.502681e+09  3.773174e+09
Educational technology   1.851803e+09  5.351611e+08
Financial technology     1.318578e+09  2.588036e+09
Industrial automation    6.135292e+08  1.178275e+09
Insurance technology     8.346663e+08  9.571610e+08
Medical and healthcare   4.051564e+08  4.281886e+09
Music and video content  8.297374e+08  1.083461e+09
Retail                   1.852845e+08  1.696553e+09
Sales enablement         1.563208e+09  1.388325e+09


In [None]:
# Filter the data for the top 10 sectors
top_sectors_data = filtered_data[filtered_data['Private Sector'].isin(top_sectors)]

# Convert the 'USA' values to billions
top_sectors_data['USA_Billions'] = top_sectors_data['USA'] / 1e9

# Convert the 'China' values to millions
top_sectors_data['China_Millions'] = top_sectors_data['China'] / 1e6

# Calculate the average for each country (USA and China) across all top 10 sectors
average_by_country = top_sectors_data.groupby('Private Sector').agg({'China_Millions': 'mean', 'USA_Billions': 'mean'})

print(average_by_country)


                         China_Millions  USA_Billions
Private Sector                                       
Cybersecurity                986.887864      2.289687
Data management             1502.680557      3.773174
Educational technology      1851.803443      0.535161
Financial technology        1318.577632      2.588036
Industrial automation        613.529159      1.178275
Insurance technology         834.666298      0.957161
Medical and healthcare       405.156422      4.281886
Music and video content      829.737366      1.083461
Retail                       185.284452      1.696553
Sales enablement            1563.208370      1.388325




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
# Focused on graphing USA and China ONLY to highlight their dominance in private investment spending

# Filter the data for the top 10 sectors
top_sectors_data = filtered_data[filtered_data['Private Sector'].isin(top_sectors)]

# Create a bar plot with 'Year' and 'Value' in the hover tooltip
fig = px.bar(top_sectors_data, x='Private Sector', y=['China', 'USA'], title='Top Private Sectors (2013-2021) by Sum of Values (China and USA)',
             hover_name='Year', color_discrete_sequence=px.colors.qualitative.Set2)

# Show the plot
fig.show()

#### PRIVATE INVESTMENT IN AI DEVELOPEMENT IN MEDICAL & HEALTHCARE FROM 2013-2022

In [None]:
# Checking for Missing Values

# Filter the data for "Medical & Healthcare" and years 2017-2022
Medical_Healthcare = df_PI[(df_PI['Private Sector'] == 'Medical and healthcare') & (df_PI['Year'].between(2013, 2022))]

print(Medical_Healthcare)

             Private Sector  Year       World     EU & UK      China  \
108  Medical and healthcare  2017  2643466659   339881508  373718565   
109  Medical and healthcare  2018  5883191498   297563920  502976635   
110  Medical and healthcare  2019  5497100065  1142473179  369221918   
111  Medical and healthcare  2020  7524641500   455859599  548736964   
112  Medical and healthcare  2021  8615669754  1741353900  400878916   
113  Medical and healthcare  2022  5605698104   707457022  235405533   

            USA  
108  1787252592  
109  4752017398  
110  3572848788  
111  5959282699  
112  5736596464  
113  3883317553  


In [None]:
# CODE NOT NEEDED. THERE ARE NO MISSING VALUES IN THE MEDICAL & HEALTHCARE PRIVATE SECTOR

# # Calculate the average for the "EU & UK" and "China" columns
# average_EU_UK = Medical_Healthcare['EU & UK'].mean()
# average_China = Medical_Healthcare['China'].mean()
# average_USA = Medical_Healthcare['USA'].mean()
# average_World = Medical_Healthcare['World'].mean()

# # Replace missing values with the calculated averages
# Medical_Healthcare['EU & UK'].replace(0, average_EU_UK, inplace=True)
# Medical_Healthcare['China'].replace(0, average_China, inplace=True)
# Medical_Healthcare['USA'].replace(0, average_USA, inplace=True)
# Medical_Healthcare['World'].replace(0, average_World, inplace=True)

# # Print the updated DataFrame
# print(Medical_Healthcare)

In [None]:
# Calculate the average for each region
average_by_region = Medical_Healthcare.groupby('Private Sector').mean()

print(average_by_region)

                          Year         World       EU & UK         China  \
Private Sector                                                             
Medical and healthcare  2019.5  5.961628e+09  7.807649e+08  4.051564e+08   

                                 USA  
Private Sector                        
Medical and healthcare  4.281886e+09  


In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots


# Filter the data for "Medical & Healthcare" and years 2013-2022
Medical_Healthcare = df_PI[(df_PI['Private Sector'] == 'Medical and healthcare') & (df_PI['Year'].between(2013, 2022))]

# Create a horizontal bar plot for the "EU & UK" column
fig_eu_uk = px.bar(Medical_Healthcare, x='EU & UK', y='Year', orientation='h',
                   title='Private Investment in AI Development in the Medical & Healthcare Sector (EU & UK)',
                   color_discrete_sequence=px.colors.qualitative.Prism)
fig_eu_uk.update_xaxes(title_text='Value')

# Create a horizontal bar plot for the "USA" column
fig_usa = px.bar(Medical_Healthcare, x='USA', y='Year', orientation='h',
                 title='Private Investment in AI Development in the Medical & Healthcare Sector (USA)')
fig_usa.update_xaxes(title_text='Value')

# Create a horizontal bar plot for the "China" column with plain red bars
fig_china = px.bar(Medical_Healthcare, x='China', y='Year', orientation='h',
                   title='Private Investment in AI Development in the Medical & Healthcare Sector (China)',
                   color_discrete_sequence=['red'])
fig_china.update_xaxes(title_text='Value')
fig_china.update_layout(showlegend=False)

# Combine the three plots using facet_col with adjusted subplot sizes
fig_combined = make_subplots(rows=1, cols=3,
                             subplot_titles=['EU & UK', 'USA', 'China'],
                             shared_yaxes=True,
                             row_heights=[1],  # Adjust the height of each row
                             column_widths=[0.5, 0.5, 0.5],  # Adjust the width of each column
                             horizontal_spacing=0.03)  # Adjust the spacing between subplots

# Add each subplot to the combined figure
fig_combined.add_trace(fig_eu_uk['data'][0], row=1, col=1)
fig_combined.add_trace(fig_usa['data'][0], row=1, col=2)
fig_combined.add_trace(fig_china['data'][0], row=1, col=3)

# Update layout
fig_combined.update_layout(title_text='Private Investment in AI Development in the Medical & Healthcare Sector',
                           height=500, width=1500)

# Show the combined plot
fig_combined.show()



In [None]:
# Create a horizontal bar plot for the "EU & UK"column
fig = px.bar(Medical_Healthcare, x=['EU & UK'], y='Year', orientation='h', title='Private Investiment in AI Development in the Medical & Healthcare Sector (EU & UK)',color_discrete_sequence=px.colors.qualitative.Prism)

# Add x-axis title
fig.update_xaxes(title_text='Value')

# Show the plot
fig.show()

In [None]:
# Create a horizontal bar plot for the "USA" column
fig = px.bar(Medical_Healthcare, x='USA', y='Year', orientation='h', title='Private Investiment in AI Development in the Medical & Healthcare Sector (USA)')
fig.update_xaxes(title_text='Value')


# Show the plot
fig.show()

In [None]:
# Create a horizontal bar plot for the "CHINA" column with plain red bars
fig = px.bar(Medical_Healthcare, x='China', y='Year', orientation='h', title='Private Investiment in AI Development in the Medical & Healthcare Sector (EU & UK)',
             color="Private Sector", color_discrete_sequence=['red'])

# Add x-axis title
fig.update_xaxes(title_text='Value')

# Remove the legend
fig.update_layout(showlegend=False)

# Show the plot
fig.show()


In [None]:
# Filter the data for "Medical & Healthcare" and years 2013-2022
Medical_Healthcare = df_PI[(df_PI['Private Sector'] == 'Medical and healthcare') & (df_PI['Year'].between(2013, 2022))]

# Plot the data using Plotly Express
fig = px.line(Medical_Healthcare, x='Year', y=['World', 'EU & UK', 'China', 'USA'], title='Medical & Healthcare Private Sector Over Time', markers=True)

# Customize the plot
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Value',
)

# Show the plot
fig.show()

In [None]:
Medical_Healthcare = df_PI[(df_PI['Private Sector'] == 'Medical and healthcare') & (df_PI['Year'].between(2013, 2022))]

# Plot the histogram using Plotly Express with pastel colors and transparency
fig = px.histogram(Medical_Healthcare,
                   x='Year',
                   y=['EU & UK', 'China', 'USA'],
                   title='Medical & Healthcare Private Investment Distribution From 2017-2022',
                   color_discrete_map={
                       'EU & UK': 'lightblue',  # Choose pastel color for 'EU & UK'
                       'China': 'lightgreen',   # Choose pastel color for 'China'
                       'USA': 'lightcoral'      # Choose pastel color for 'USA'
                   },
                   opacity=0.98  # Adjust transparency (0 is fully transparent, 1 is fully opaque)
)

# Customize the plot
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Value',
)

# Show the plot
fig.show()

#### Private Investments In Cybersecurity

In [None]:
# Checking for Missing Values

# Filter the data for "Cybersecurity" and years 2017-2022
Cybersecurity = df_PI[(df_PI['Private Sector'] == 'Cybersecurity') & (df_PI['Year'].between(2013, 2022))]

print(Cybersecurity)

# Calculate the average for each region
average_by_region = Cybersecurity.groupby('Private Sector').mean()

print('Average By Region:', average_by_region)

import plotly.graph_objects as go
from plotly.subplots import make_subplots


# Filter the data for "Cybersecurity" and years 2013-2022
Cybersecurity = df_PI[(df_PI['Private Sector'] == 'Cybersecurity') & (df_PI['Year'].between(2013, 2022))]

# Create a horizontal bar plot for the "EU & UK" column
fig_eu_uk = px.bar(Cybersecurity, x='EU & UK', y='Year', orientation='h',
                   title='Private Investment in AI Development in the Cybersecurity Sector (EU & UK)',
                   color_discrete_sequence=px.colors.qualitative.Prism)
fig_eu_uk.update_xaxes(title_text='Value')

# Create a horizontal bar plot for the "USA" column
fig_usa = px.bar(Cybersecurity, x='USA', y='Year', orientation='h',
                 title='Private Investment in AI Development in the Cybersecurity Sector (USA)')
fig_usa.update_xaxes(title_text='Value')

# Create a horizontal bar plot for the "China" column with plain red bars
fig_china = px.bar(Cybersecurity, x='China', y='Year', orientation='h',
                   title='Private Investment in AI Development in the Cybersecurity Sector (China)',
                   color_discrete_sequence=['red'])
fig_china.update_xaxes(title_text='Value')
fig_china.update_layout(showlegend=False)

# Combine the three plots using facet_col with adjusted subplot sizes
fig_combined = make_subplots(rows=1, cols=3,
                             subplot_titles=['EU & UK', 'USA', 'China'],
                             shared_yaxes=True,
                             row_heights=[1],  # Adjust the height of each row
                             column_widths=[0.5, 0.5, 0.5],  # Adjust the width of each column
                             horizontal_spacing=0.03)  # Adjust the spacing between subplots

# Add each subplot to the combined figure
fig_combined.add_trace(fig_eu_uk['data'][0], row=1, col=1)
fig_combined.add_trace(fig_usa['data'][0], row=1, col=2)
fig_combined.add_trace(fig_china['data'][0], row=1, col=3)

# Update layout
fig_combined.update_layout(title_text='Private Investment in AI Development in the Cybersecurity Sector',
                           height=500, width=1500)

# Show the combined plot
fig_combined.show()



# Filter the data for "Cybersecurity" and years 2013-2022
Cybersecurity = df_PI[(df_PI['Private Sector'] == 'Cybersecurity') & (df_PI['Year'].between(2013, 2022))]

# Plot the data using Plotly Express
fig = px.line(Cybersecurity, x='Year', y=['World', 'EU & UK', 'China', 'USA'], title='A Global Comparison: Cybersecurity Private Sector Investment Distribution From 2017-2022', markers=True)

# Customize the plot
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Value',
)

# Show the plot
fig.show()

Cybersecurity = df_PI[(df_PI['Private Sector'] == 'Cybersecurity') & (df_PI['Year'].between(2013, 2022))]

# Plot the histogram using Plotly Express with pastel colors and transparency
fig = px.histogram(Cybersecurity,
                   x='Year',
                   y=['EU & UK', 'China', 'USA'],
                   title='Cybersecurity Private Sector Investment Distribution From 2017-2022',
                   color_discrete_map={
                       'EU & UK': 'teal',   # Choose a color for 'EU & UK'
                       'China': 'orange',  # Choose a color for 'China'
                       'USA': 'purple',    # Choose a color for 'USA'
                   },
                   opacity=0.98  # Adjust transparency (0 is fully transparent, 1 is fully opaque)
)

# Customize the plot
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Value',
)

# Show the plot
fig.show()


   Private Sector  Year       World    EU & UK       China         USA
18  Cybersecurity  2017  2477972877  211647287   799646679   895792147
19  Cybersecurity  2018  4071620825  172523985   583572923  1918326970
20  Cybersecurity  2019  4110878261  161192831  1962121344  1713555083
21  Cybersecurity  2020  3798115473  352707616   519832488  2821490011
22  Cybersecurity  2021  4077164559   85748120  1063167893  2810270650
23  Cybersecurity  2022  4978670210  217439686   992985857  3578690067
Average By Region:                   Year         World       EU & UK        China           USA
Private Sector                                                               
Cybersecurity   2019.5  3.919070e+09  2.002099e+08  986887864.0  2.289687e+09


### PREDICTIVE TIME SERIES

In [None]:
pip install statsmodels




#### FORECASTING ON THE TOP 5 SECTORS

#### AUTOREGRESSIVE MOVING-AVERAGE (ARMA) MODEL: FORECASTING THE NEXT FIVE YEARS OF PRIVATE INVESTMENTS IN AI IN EACH OF THE FEATURES [USA, CHINA, EU & UK]

## ARMA MODEL

In [None]:
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error

# TOP 5 SECTORS

# Filter the data for the selected sectors and years
filtered_data = df_PI[
    (df_PI['Year'].between(2013, 2022)) &
    (df_PI['Private Sector'] != 'Total')
]

# Select specific features for the model (excluding 'Year', 'Private Sector', 'Total')
features = ['EU & UK', 'China', 'USA']

# Function to run ARIMA model for each private sector and feature
def run_arma_model(sector_data):
    forecasts = {}

    for feature in features:
        # Fit ARIMA model
        model = ARIMA(sector_data[feature], order=(1, 0, 1))  # Adjusted order to (1, 0, 1)
        results = model.fit()

        # Forecast the next 5 years
        forecast = results.get_forecast(steps=5)

        # Extract forecast values
        forecast_values = forecast.predicted_mean

        # Store the forecasts in the dictionary
        forecasts[feature] = forecast_values

    return forecasts

# Dictionary to store forecasts and evaluation metrics for each sector
results_dict = {}

# Select the top 5 sectors
top5_sectors = top_sectors[:5]

# Loop through each top sector
for sector in top5_sectors:
    # Filter data for the specific sector
    sector_data = filtered_data[filtered_data['Private Sector'] == sector].set_index('Year')

    # Run ARIMA model and get forecasts for each feature
    forecasts = run_arma_model(sector_data)

    # Store the forecasts in the dictionary
    results_dict[sector] = forecasts

    # Evaluate forecast accuracy using MSE and MAE
    for feature in features:
        true_values = sector_data[feature].iloc[-5:].values
        mse = mean_squared_error(true_values, forecasts[feature])
        mae = mean_absolute_error(true_values, forecasts[feature])

        print(f"\nResults for {sector} - {feature}:")
        print("Forecast:", forecasts[feature])
        print("True Values:", true_values)
        print("Mean Squared Error (MSE):", mse)
        print("Mean Absolute Error (MAE):", mae)

# Display the forecast DataFrame
forecast_df = pd.DataFrame.from_dict({(i, j): results_dict[i][j]
                                       for i in results_dict.keys()
                                       for j in results_dict[i].keys()},
                                     orient='columns')

# Display the forecast DataFrame
print(forecast_df)



An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


No supported index is available. Prediction results will be given with an integer index begin


Results for Data management - EU & UK:
Forecast: 6     2.985403e+08
7     3.691361e+08
8     3.957551e+08
9     4.057921e+08
10    4.095766e+08
Name: predicted_mean, dtype: float64
True Values: [ 183793071  227069721  323175209 1242036553  219221105]
Mean Squared Error (MSE): 1.5483153356859667e+17
Mean Absolute Error (MAE): 271198705.20529956

Results for Data management - China:
Forecast: 6     1.493997e+09
7     1.508460e+09
8     1.498834e+09
9     1.505241e+09
10    1.500977e+09
Name: predicted_mean, dtype: float64
True Values: [1671044273 1136812833 1277545545 2134058648 1727359990]
Mean Squared Error (MSE): 1.3301952357850107e+17
Mean Absolute Error (MAE): 325036927.21468484

Results for Data management - USA:
Forecast: 6     2.934691e+09
7     4.133360e+09
8     3.618448e+09
9     3.839639e+09
10    3.744622e+09
Name: predicted_mean, dtype: float64
True Values: [3132928239 4196778524 4627080938 5486092409 2894398654]
Mean Squared Error (MSE): 8.988696416079e+17
Mean Absolute E


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Non-stationary st


Results for Medical and healthcare - EU & UK:
Forecast: 6     1.063049e+09
7     4.984802e+08
8     1.063049e+09
9     4.984802e+08
10    1.063049e+09
Name: predicted_mean, dtype: float64
True Values: [ 297563920 1142473179  455859599 1741353900  707457022]
Mean Squared Error (MSE): 6.081111079774405e+17
Mean Absolute Error (MAE): 723026899.3142151

Results for Medical and healthcare - China:
Forecast: 6     4.347000e+08
7     4.172059e+08
8     4.100708e+08
9     4.071608e+08
10    4.059739e+08
Name: predicted_mean, dtype: float64
True Values: [502976635 369221918 548736964 400878916 235405533]
Mean Squared Error (MSE): 1.1065096945975568e+16
Mean Absolute Error (MAE): 86355383.5814744

Results for Medical and healthcare - USA:
Forecast: 6     4.391718e+09
7     4.202464e+09
8     4.339317e+09
9     4.240356e+09
10    4.311917e+09
Name: predicted_mean, dtype: float64
True Values: [4752017398 3572848788 5959282699 5736596464 3883317553]
Mean Squared Error (MSE): 1.1145901861519583e+18


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


Results for Financial technology - EU & UK:
Forecast: 6     6.571826e+08
7     5.819871e+08
8     5.374857e+08
9     5.111492e+08
10    4.955631e+08
Name: predicted_mean, dtype: float64
True Values: [223428044 332120920 690423903 503174637 868080225]
Mean Squared Error (MSE): 8.255976465041667e+16
Mean Absolute Error (MAE): 243410142.0150996

Results for Financial technology - China:
Forecast: 6     5.143496e+08
7     1.084045e+09
8     1.250182e+09
9     1.298632e+09
10    1.312761e+09
Name: predicted_mean, dtype: float64
True Values: [2468722003 2585267898  865858812  457814027   30369583]
Mean Squared Error (MSE): 1.7144894908107494e+18
Mean Absolute Error (MAE): 1192625555.906416

Results for Financial technology - USA:
Forecast: 6     2.575851e+09
7     2.591634e+09
8     2.586974e+09
9     2.588350e+09
10    2.587943e+09
Name: predicted_mean, dtype: float64
True Values: [1369445756 3965148753 1564548827 5209476610 2991720037]
Mean Squared Error (MSE): 2.2841296923815409e+18
Mean


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


No supported inde


Results for Cybersecurity - EU & UK:
Forecast: 6     2.045677e+08
7     1.984456e+08
8     2.009242e+08
9     1.999207e+08
10    2.003270e+08
Name: predicted_mean, dtype: float64
True Values: [172523985 161192831 352707616  85748120 217439686]
Mean Squared Error (MSE): 7756196871212163.0
Mean Absolute Error (MAE): 70473020.91944762

Results for Cybersecurity - China:
Forecast: 6     8.845887e+08
7     1.024921e+09
8     9.727480e+08
9     9.921448e+08
10    9.849335e+08
Name: predicted_mean, dtype: float64
True Values: [ 583572923 1962121344  519832488 1063167893  992985857]
Mean Squared Error (MSE): 2.3583946863429434e+17
Mean Absolute Error (MAE): 354041525.4881791

Results for Cybersecurity - USA:
Forecast: 6     3.151434e+09
7     2.913675e+09
8     2.741514e+09
9     2.616853e+09
10    2.526587e+09
Name: predicted_mean, dtype: float64
True Values: [1918326970 1713555083 2821490011 2810270650 3578690067]
Mean Squared Error (MSE): 8.223137068053993e+17
Mean Absolute Error (MAE): 75


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


No supported index is available. Prediction results will be given with an integer index begin


Results for Sales enablement - EU & UK:
Forecast: 6     9.630212e+07
7     9.271090e+07
8     9.177110e+07
9     9.152516e+07
10    9.146080e+07
Name: predicted_mean, dtype: float64
True Values: [107607341  35216245  93847522  30339737 147968093]
Mean Squared Error (MSE): 2074897050997511.5
Mean Absolute Error (MAE): 37713803.67815183

Results for Sales enablement - China:
Forecast: 6     1.648159e+09
7     1.617895e+09
8     1.598413e+09
9     1.585871e+09
10    1.577798e+09
Name: predicted_mean, dtype: float64
True Values: [1863744207 2160764156  859738029 1088548323 1557779673]
Mean Squared Error (MSE): 2.2691100113210624e+17
Mean Absolute Error (MAE): 402893935.2741302

Results for Sales enablement - USA:
Forecast: 6     1.304988e+09
7     1.338207e+09
8     1.358184e+09
9     1.370199e+09
10    1.377424e+09
Name: predicted_mean, dtype: float64
True Values: [1212903127 1163191612 1504536420 2647310868 1039963332]
Mean Squared Error (MSE): 3.6108484186689274e+17
Mean Absolute Error

In [None]:
# FORECAST PRESENTED IN TABLE FORMAT

import pandas as pd

# List to store DataFrames for each sector
sector_dfs = []

# Loop through each top sector
for sector in top5_sectors:
    # Filter data for the specific sector
    sector_data = filtered_data[filtered_data['Private Sector'] == sector].set_index('Year')

    # Run ARIMA model and get forecasts for each feature
    forecasts = run_arma_model(sector_data)

    # Create a DataFrame for the sector
    sector_df = pd.DataFrame.from_dict(forecasts, orient='columns')
    sector_df['Sector'] = sector  # Add a column for the sector name
    sector_df = sector_df.set_index('Sector')

    # Append the DataFrame to the list
    sector_dfs.append(sector_df)

# Concatenate the DataFrames in the list to create two separate tables
table1 = pd.concat(sector_dfs[:2], axis=0)
table2 = pd.concat(sector_dfs[2:], axis=0)

# Display the tables

display(table1)


display(table2)



An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.


Non-invertible starting MA parameters found. Using zeros as starting parameters.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


No supported index is available. Prediction results will be given with an integer index begin

Unnamed: 0_level_0,EU & UK,China,USA
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Data management,298540300.0,1493997000.0,2934691000.0
Data management,369136100.0,1508460000.0,4133360000.0
Data management,395755100.0,1498834000.0,3618448000.0
Data management,405792100.0,1505241000.0,3839639000.0
Data management,409576600.0,1500977000.0,3744622000.0
Medical and healthcare,1063049000.0,434700000.0,4391718000.0
Medical and healthcare,498480200.0,417205900.0,4202464000.0
Medical and healthcare,1063049000.0,410070800.0,4339317000.0
Medical and healthcare,498480200.0,407160800.0,4240356000.0
Medical and healthcare,1063049000.0,405973900.0,4311917000.0


Unnamed: 0_level_0,EU & UK,China,USA
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Financial technology,657182600.0,514349600.0,2575851000.0
Financial technology,581987100.0,1084045000.0,2591634000.0
Financial technology,537485700.0,1250182000.0,2586974000.0
Financial technology,511149200.0,1298632000.0,2588350000.0
Financial technology,495563100.0,1312761000.0,2587943000.0
Cybersecurity,204567700.0,884588700.0,3151434000.0
Cybersecurity,198445600.0,1024921000.0,2913675000.0
Cybersecurity,200924200.0,972748000.0,2741514000.0
Cybersecurity,199920700.0,992144800.0,2616853000.0
Cybersecurity,200327000.0,984933500.0,2526587000.0


## Plotting ARMA Model


In [None]:
import plotly.express as px
import plotly.graph_objects as go

# Assuming you have a list of colors for each feature
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

# Create a list to store individual DataFrames for each sector
sector_dfs = []

# Loop through each top sector
for sector, forecasts in results_dict.items():
    # Create a DataFrame for the sector's forecasts
    sector_df = pd.DataFrame(forecasts)
    sector_df['Year'] = range(2023, 2028)  # Assuming the forecast is for the next 5 years
    sector_df['Sector'] = sector

    # Append the DataFrame to the list
    sector_dfs.append(sector_df)

# Concatenate the DataFrames in the list to create a single DataFrame
forecast_graph_df = pd.concat(sector_dfs, ignore_index=True)

# Plot the investments against years for the top 5 sectors in separate subplots
fig = make_subplots(rows=5, cols=1, shared_xaxes=True, subplot_titles=top5_sectors)

# Loop through each sector
for i, sector in enumerate(top5_sectors):
    for j, feature in enumerate(features):
        # Add trace for each feature
        trace = go.Scatter(x=forecast_graph_df[(forecast_graph_df['Sector'] == sector)]['Year'],
                           y=forecast_graph_df[(forecast_graph_df['Sector'] == sector)][feature],
                           mode='lines', name=feature, line=dict(color=colors[j]))
        fig.add_trace(trace, row=i+1, col=1)

# Update layout
fig.update_layout(title_text="Next 5-Year Forecast for Top 5 Sectors",
                  xaxis_title="Year",
                  yaxis_title="Investment Amount",
                  showlegend=True,
                  height=1200,
                  width=800)

# Set legend to only display each feature once
fig.update_layout(legend=dict(traceorder='normal'))

# Set x-axis ticks to show only whole years
fig.update_xaxes(tickvals=list(range(2023, 2029)), ticktext=list(range(2023, 2029)))

# Show plot
fig.show()


## SARIMA MODEL

In [None]:
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Assuming df_PI is your DataFrame with the provided data

# Filter the data for the selected sectors and years
filtered_data = df_PI[
    (df_PI['Year'].between(2013, 2022)) &
    (df_PI['Private Sector'] != 'Total')
]

# Select specific features for the model (excluding 'Year', 'Private Sector', 'Total')
features = ['EU & UK', 'China', 'USA']

# Function to run SARIMA model for a given private sector and features
def run_sarima_model(sector_data):
    # SARIMA order parameters (p, d, q) and (P, D, Q, s) for seasonality
    order = (1, 0, 1)
    seasonal_order = (1, 1, 1, 4)  # Assuming seasonality with a period of 4 (quarters)

    # Fit SARIMA model
    model = SARIMAX(sector_data, order=order, seasonal_order=seasonal_order)
    results = model.fit(disp=False)

    # Forecast the next 5 years
    forecast = results.get_forecast(steps=5)

    # Extract forecast values
    forecast_values = forecast.predicted_mean

    return forecast_values

# Dictionary to store forecasts and evaluation metrics for each sector
results_dict_sarima = {}

# Select the top 5 sectors
top5_sectors = top_sectors[:5]

# Loop through each top sector
for sector in top5_sectors:
    # Filter data for the specific sector
    sector_data = filtered_data[filtered_data['Private Sector'] == sector].set_index('Year')

    # Run SARIMA model and get forecasts for each feature
    forecasts_sarima = {}
    for feature in features:
        forecast_sarima = run_sarima_model(sector_data[feature])
        forecasts_sarima[feature] = forecast_sarima

        # Evaluate forecast accuracy using MSE and MAE
        true_values_sarima = sector_data[feature].iloc[-5:].values
        mse_sarima = mean_squared_error(true_values_sarima, forecast_sarima)
        mae_sarima = mean_absolute_error(true_values_sarima, forecast_sarima)

        print(f"\nResults for {sector} - {feature} (SARIMA):")
        print("Forecast:", forecast_sarima)
        print("True Values:", true_values_sarima)
        print("Mean Squared Error (MSE):", mse_sarima)
        print("Mean Absolute Error (MAE):", mae_sarima)

    # Store the forecasts in the dictionary
    results_dict_sarima[sector] = forecasts_sarima

# Display the SARIMA forecast DataFrame
forecast_df_sarima = pd.DataFrame.from_dict({(i, j): results_dict_sarima[i][j]
                                             for i in results_dict_sarima.keys()
                                             for j in results_dict_sarima[i].keys()},
                                           orient='columns')

# Display the SARIMA forecast DataFrame
print(forecast_df_sarima)



An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimat


Results for Data management - EU & UK (SARIMA):
Forecast: 6     6.963201e+08
7     4.055969e+08
8     2.577334e+09
9     4.628942e+07
10    1.399326e+09
Name: predicted_mean, dtype: float64
True Values: [ 183793071  227069721  323175209 1242036553  219221105]
Mean Squared Error (MSE): 1.6396492497753482e+18
Mean Absolute Error (MAE): 1064212996.4208906

Results for Data management - China (SARIMA):
Forecast: 6     9.321895e+08
7     1.169605e+09
8     2.063410e+09
9     1.715885e+09
10    9.384035e+08
Name: predicted_mean, dtype: float64
True Values: [1671044273 1136812833 1277545545 2134058648 1727359990]
Mean Squared Error (MSE): 3.9237708990699725e+17
Mean Absolute Error (MAE): 552928142.4575989



No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. fore


Results for Data management - USA (SARIMA):
Forecast: 6     2.492813e+09
7     3.221642e+09
8     4.478280e+09
9     2.646588e+09
10    2.497261e+09
Name: predicted_mean, dtype: float64
True Values: [3132928239 4196778524 4627080938 5486092409 2894398654]
Mean Squared Error (MSE): 1.920656717810337e+18
Mean Absolute Error (MAE): 1000138943.9542294

Results for Medical and healthcare - EU & UK (SARIMA):
Forecast: 6     2.594197e+09
7     1.222727e+09
8     3.822800e+09
9     1.327668e+09
10    4.348514e+09
Name: predicted_mean, dtype: float64
True Values: [ 297563920 1142473179  455859599 1741353900  707457022]
Mean Squared Error (MSE): 6.009136238964649e+18
Mean Absolute Error (MAE): 1959714105.904694



No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.




Results for Medical and healthcare - China (SARIMA):
Forecast: 6     1.820301e+08
7     2.848171e+08
8     3.876694e+08
9     3.641707e+08
10    2.721082e+08
Name: predicted_mean, dtype: float64
True Values: [502976635 369221918 548736964 400878916 235405533]
Mean Squared Error (MSE): 2.7753644007559744e+16
Mean Absolute Error (MAE): 127965976.36260828

Results for Medical and healthcare - USA (SARIMA):
Forecast: 6     2.481885e+09
7     4.940547e+09
8     5.094827e+09
9     4.026755e+09
10    2.660792e+09
Name: predicted_mean, dtype: float64
True Values: [4752017398 3572848788 5959282699 5736596464 3883317553]
Mean Squared Error (MSE): 2.4379023043007457e+18
Mean Absolute Error (MAE): 1486930760.5796864

Results for Financial technology - EU & UK (SARIMA):
Forecast: 6     8.277415e+08
7     1.324086e+09
8     9.894429e+08
9     1.531832e+09
10    1.387868e+09
Name: predicted_mean, dtype: float64
True Values: [223428044 332120920 690423903 503174637 868080225]
Mean Squared Error (MSE)


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimat


Results for Financial technology - China (SARIMA):
Forecast: 6     3.464518e+08
7     4.136954e+07
8     1.412687e+09
9     2.178341e+09
10    2.349417e+09
Name: predicted_mean, dtype: float64
True Values: [2468722003 2585267898  865858812  457814027   30369583]
Mean Squared Error (MSE): 3.9225327074189583e+18
Mean Absolute Error (MAE): 1850514201.6034534

Results for Financial technology - USA (SARIMA):
Forecast: 6     6.235660e+09
7     3.835042e+09
8     7.479993e+09
9     5.262209e+09
10    8.506134e+09
Name: predicted_mean, dtype: float64
True Values: [1369445756 3965148753 1564548827 5209476610 2991720037]
Mean Squared Error (MSE): 1.7820195420083624e+19
Mean Absolute Error (MAE): 3295782093.8666887



No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.




Results for Cybersecurity - EU & UK (SARIMA):
Forecast: 6     1.457180e+08
7     3.588946e+08
8     8.356687e+07
9     2.182200e+08
10    1.454511e+08
Name: predicted_mean, dtype: float64
True Values: [172523985 161192831 352707616  85748120 217439686]
Mean Squared Error (MSE): 2.6994490748413348e+16
Mean Absolute Error (MAE): 139621793.56783164



Maximum Likelihood optimization failed to converge. Check mle_retvals


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.




Results for Cybersecurity - China (SARIMA):
Forecast: 6     2.900377e+09
7     8.438545e+08
8     1.160429e+09
9     1.233113e+09
10    3.366270e+09
Name: predicted_mean, dtype: float64
True Values: [ 583572923 1962121344  519832488 1063167893  992985857]
Mean Squared Error (MSE): 2.53796473714596e+18
Mean Absolute Error (MAE): 1323779252.7878754



No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. fore


Results for Cybersecurity - USA (SARIMA):
Forecast: 6     3.149603e+09
7     4.069671e+09
8     3.887035e+09
9     4.512989e+09
10    3.957742e+09
Name: predicted_mean, dtype: float64
True Values: [1918326970 1713555083 2821490011 2810270650 3578690067]
Mean Squared Error (MSE): 2.2491281494859236e+18
Mean Absolute Error (MAE): 1346941628.9097714

Results for Sales enablement - EU & UK (SARIMA):
Forecast: 6     2.403983e+07
7     1.009128e+08
8     2.572915e+07
9     1.501160e+08
10    2.359489e+07
Name: predicted_mean, dtype: float64
True Values: [107607341  35216245  93847522  30339737 147968093]
Mean Squared Error (MSE): 9150947228790620.0
Mean Absolute Error (MAE): 92306392.95442335



Maximum Likelihood optimization failed to converge. Check mle_retvals


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.


Too few observations to estimate starting parameters for ARMA and trend. All parameters except for variances will be set to zeros.


Too few observations to estimate starting parameters for seasonal ARMA. All parameters except for variances will be set to zeros.




Results for Sales enablement - China (SARIMA):
Forecast: 6     1.976176e+09
7     7.830815e+08
8     1.146726e+09
9     1.595306e+09
10    2.012260e+09
Name: predicted_mean, dtype: float64
True Values: [1863744207 2160764156  859738029 1088548323 1557779673]
Mean Squared Error (MSE): 4.912737545480958e+17
Mean Absolute Error (MAE): 547668265.4419436

Results for Sales enablement - USA (SARIMA):
Forecast: 6     5.243545e+08
7     1.975783e+09
8     3.026282e+09
9     7.566737e+08
10    1.113940e+08
Name: predicted_mean, dtype: float64
True Values: [1212903127 1163191612 1504536420 2647310868 1039963332]
Mean Squared Error (MSE): 1.5773725794326236e+18
Mean Absolute Error (MAE): 1168418381.7552297
   Data management                             Medical and healthcare  \
           EU & UK         China           USA                EU & UK   
6     6.963201e+08  9.321895e+08  2.492813e+09           2.594197e+09   
7     4.055969e+08  1.169605e+09  3.221642e+09           1.222727e+09   
8 


No supported index is available. Prediction results will be given with an integer index beginning at `start`.


No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.



In [None]:
import pandas as pd

# Results data
results_data = {
    'Data management - EU & UK': {
        'Forecast': [6.963201e+08, 4.055969e+08, 2.577334e+09, 4.628942e+07, 1.399326e+09],
        'True Values': [183793071, 227069721, 323175209, 1242036553, 219221105],
        'MSE': 1.6396492497753482e+18,
        'MAE': 1064212996.4208906
    },
    'Data management - China': {
        'Forecast': [9.321895e+08, 1.169605e+09, 2.063410e+09, 1.715885e+09, 9.384035e+08],
        'True Values': [1671044273, 1136812833, 1277545545, 2134058648, 1727359990],
        'MSE': 3.9237708990699725e+17,
        'MAE': 552928142.4575989
    },
    'Data management - USA': {
        'Forecast': [2.492813e+09, 3.221642e+09, 4.478280e+09, 2.646588e+09, 2.497261e+09],
        'True Values': [3132928239, 4196778524, 4627080938, 5486092409, 2894398654],
        'MSE': 1.920656717810337e+18,
        'MAE': 1000138943.9542294
    },
    'Medical and healthcare - EU & UK': {
        'Forecast': [2.594197e+09, 1.222727e+09, 3.822800e+09, 1.327668e+09, 4.348514e+09],
        'True Values': [297563920, 1142473179, 455859599, 1741353900, 707457022],
        'MSE': 6.009136238964649e+18,
        'MAE': 1959714105.904694
    },
    'Medical and healthcare - China': {
        'Forecast': [1.820301e+08, 2.848171e+08, 3.876694e+08, 3.641707e+08, 2.721082e+08],
        'True Values': [502976635, 369221918, 548736964, 400878916, 235405533],
        'MSE': 2.7753644007559744e+16,
        'MAE': 127965976.36260828
    },
    'Medical and healthcare - USA': {
        'Forecast': [2.481885e+09, 4.940547e+09, 5.094827e+09, 4.026755e+09, 2.660792e+09],
        'True Values': [4752017398, 3572848788, 5959282699, 5736596464, 3883317553],
        'MSE': 2.4379023043007457e+18,
        'MAE': 1486930760.5796864
    },
    'Financial technology - EU & UK': {
        'Forecast': [8.277415e+08, 1.324086e+09, 9.894429e+08, 1.531832e+09, 1.387868e+09],
        'True Values': [223428044, 332120920, 690423903, 503174637, 868080225],
        'MSE': 5.5338345642462944e+17,
        'MAE': 688748586.4814819
    },
    'Financial technology - China': {
        'Forecast': [3.464518e+08, 4.136954e+07, 1.412687e+09, 2.178341e+09, 2.349417e+09],
        'True Values': [2468722003, 2585267898, 865858812, 457814027, 30369583],
        'MSE': 3.9225327074189583e+18,
        'MAE': 1850514201.6034534
    },
    'Financial technology - USA': {
        'Forecast': [6.235660e+09, 3.835042e+09, 7.479993e+09, 5.262209e+09, 8.506134e+09],
        'True Values': [1369445756, 3965148753, 1564548827, 5209476610, 2991720037],
        'MSE': 1.7820195420083624e+19,
        'MAE': 3295782093.8666887
    },
    'Cybersecurity - EU & UK': {
        'Forecast': [1.457180e+08, 3.588946e+08, 8.356687e+07, 2.182200e+08, 1.454511e+08],
        'True Values': [172523985, 161192831, 352707616, 85748120, 217439686],
        'MSE': 2.6994490748413348e+16,
        'MAE': 139621793.56783164
    },
    'Cybersecurity - China': {
        'Forecast': [2.900377e+09, 8.438545e+08, 1.160429e+09, 1.233113e+09, 3.366270e+09],
        'True Values': [583572923, 1962121344, 519832488, 1063167893, 992985857],
        'MSE': 2.53796473714596e+18,
        'MAE': 1323779252.7878754
    },
    'Cybersecurity - USA': {
        'Forecast': [3.149603e+09, 4.069671e+09, 3.887035e+09, 4.512989e+09, 3.957742e+09],
        'True Values': [1918326970, 1713555083, 2821490011, 2810270650, 3578690067],
        'MSE': 2.2491281494859236e+18,
        'MAE': 1346941628.9097714
    },
    'Sales enablement - EU & UK': {
        'Forecast': [2.403983e+07, 1.009128e+08, 2.572915e+07, 1.501160e+08, 2.359489e+07],
        'True Values': [107607341, 35216245, 93847522, 30339737, 147968093],
        'MSE': 9150947228790620.0,
        'MAE': 92306392.95442335
    },
    'Sales enablement - China': {
        'Forecast': [1.976176e+09, 7.830815e+08, 1.146726e+09, 1.595306e+09, 2.012260e+09],
        'True Values': [1863744207, 2160764156, 859738029, 1088548323, 1557779673],
        'MSE': 4.912737545480958e+17,
        'MAE': 547668265.4419436
    },
    'Sales enablement - USA': {
        'Forecast': [5.243545e+08, 1.975783e+09, 3.026282e+09, 7.566737e+08, 1.113940e+08],
        'True Values': [1212903127, 1163191612, 1504536420, 2647310868, 1039963332],
        'MSE': 1.5773725794326236e+18,
        'MAE': 1168418381.7552297
    }
}

# List to store DataFrames for each sector
sector_dfs = []

# Loop through each sector
for sector, data in results_data.items():
    # Create a DataFrame for the sector
    sector_df = pd.DataFrame(data)
    sector_df['Sector'] = sector  # Add a column for the sector name
    sector_df = sector_df.set_index('Sector')

    # Append the DataFrame to the list
    sector_dfs.append(sector_df)

# Concatenate the DataFrames in the list to create two separate tables
table1 = pd.concat(sector_dfs[:6], axis=0)
table2 = pd.concat(sector_dfs[6:], axis=0)

# Display the tables
print("Table 1:")
display(table1)

print("\nTable 2:")
display(table2)


Table 1:


Unnamed: 0_level_0,Forecast,True Values,MSE,MAE
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Data management - EU & UK,696320100.0,183793071,1.639649e+18,1064213000.0
Data management - EU & UK,405596900.0,227069721,1.639649e+18,1064213000.0
Data management - EU & UK,2577334000.0,323175209,1.639649e+18,1064213000.0
Data management - EU & UK,46289420.0,1242036553,1.639649e+18,1064213000.0
Data management - EU & UK,1399326000.0,219221105,1.639649e+18,1064213000.0
Data management - China,932189500.0,1671044273,3.923771e+17,552928100.0
Data management - China,1169605000.0,1136812833,3.923771e+17,552928100.0
Data management - China,2063410000.0,1277545545,3.923771e+17,552928100.0
Data management - China,1715885000.0,2134058648,3.923771e+17,552928100.0
Data management - China,938403500.0,1727359990,3.923771e+17,552928100.0



Table 2:


Unnamed: 0_level_0,Forecast,True Values,MSE,MAE
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Financial technology - EU & UK,827741500.0,223428044,5.533835e+17,688748600.0
Financial technology - EU & UK,1324086000.0,332120920,5.533835e+17,688748600.0
Financial technology - EU & UK,989442900.0,690423903,5.533835e+17,688748600.0
Financial technology - EU & UK,1531832000.0,503174637,5.533835e+17,688748600.0
Financial technology - EU & UK,1387868000.0,868080225,5.533835e+17,688748600.0
Financial technology - China,346451800.0,2468722003,3.922533e+18,1850514000.0
Financial technology - China,41369540.0,2585267898,3.922533e+18,1850514000.0
Financial technology - China,1412687000.0,865858812,3.922533e+18,1850514000.0
Financial technology - China,2178341000.0,457814027,3.922533e+18,1850514000.0
Financial technology - China,2349417000.0,30369583,3.922533e+18,1850514000.0


In [None]:
import pandas as pd

# Data provided
data = {
    'Data management': {
        'EU & UK': [6.963201e+08, 4.055969e+08, 2.577334e+09, 4.628942e+07, 1.399326e+09],
        'China': [9.321895e+08, 1.169605e+09, 2.063410e+09, 1.715885e+09, 9.384035e+08],
        'USA': [2.492813e+09, 3.221642e+09, 4.478280e+09, 2.646588e+09, 2.497261e+09]
    },
    'Medical and healthcare': {
        'EU & UK': [2.594197e+09, 1.222727e+09, 3.822800e+09, 1.327668e+09, 4.348514e+09],
        'China': [1.820301e+08, 2.848171e+08, 3.876694e+08, 3.641707e+08, 2.721082e+08],
        'USA': [2.481885e+09, 4.940547e+09, 5.094827e+09, 4.026755e+09, 2.660792e+09]
    },
    'Financial technology': {
        'EU & UK': [8.277415e+08, 1.324086e+09, 9.894429e+08, 1.531832e+09, 1.387868e+09],
        'China': [3.464518e+08, 4.136954e+07, 1.412687e+09, 2.178341e+09, 2.349417e+09],
        'USA': [6.235660e+09, 3.835042e+09, 7.479993e+09, 5.262209e+09, 8.506134e+09]
    },
    'Cybersecurity': {
        'EU & UK': [1.457180e+08, 3.588946e+08, 8.356687e+07, 2.182200e+08, 1.454511e+08],
        'China': [2.900377e+09, 8.438545e+08, 1.160429e+09, 1.233113e+09, 3.366270e+09],
        'USA': [3.149603e+09, 4.069671e+09, 3.887035e+09, 4.512989e+09, 3.957742e+09]
    },
    'Sales enablement': {
        'EU & UK': [2.403983e+07, 1.009128e+08, 2.572915e+07, 1.501160e+08, 2.359489e+07],
        'China': [1.976176e+09, 7.830815e+08, 1.146726e+09, 1.595306e+09, 2.012260e+09],
        'USA': [5.243545e+08, 1.975783e+09, 3.026282e+09, 7.566737e+08, 1.113940e+08]
    }
}

# Create DataFrames for each sector
sector_dfs = {sector: pd.DataFrame(data=values, index=['6', '7', '8', '9', '10']) for sector, values in data.items()}

# Display the tables
for sector, df in sector_dfs.items():
    print(f"\n{sector}:\n")
    print(df)
    print("\n" + "=" * 50 + "\n")



Data management:

         EU & UK         China           USA
6   6.963201e+08  9.321895e+08  2.492813e+09
7   4.055969e+08  1.169605e+09  3.221642e+09
8   2.577334e+09  2.063410e+09  4.478280e+09
9   4.628942e+07  1.715885e+09  2.646588e+09
10  1.399326e+09  9.384035e+08  2.497261e+09



Medical and healthcare:

         EU & UK        China           USA
6   2.594197e+09  182030100.0  2.481885e+09
7   1.222727e+09  284817100.0  4.940547e+09
8   3.822800e+09  387669400.0  5.094827e+09
9   1.327668e+09  364170700.0  4.026755e+09
10  4.348514e+09  272108200.0  2.660792e+09



Financial technology:

         EU & UK         China           USA
6   8.277415e+08  3.464518e+08  6.235660e+09
7   1.324086e+09  4.136954e+07  3.835042e+09
8   9.894429e+08  1.412687e+09  7.479993e+09
9   1.531832e+09  2.178341e+09  5.262209e+09
10  1.387868e+09  2.349417e+09  8.506134e+09



Cybersecurity:

        EU & UK         China           USA
6   145718000.0  2.900377e+09  3.149603e+09
7   358894600.0 