<a href="https://colab.research.google.com/github/newb-dev-1008/CoVID_Entertainment_Impact/blob/ipython/Entertainment_during_CoVID_19.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Impact of CoVID-19 and nationwide lockdowns on the Entertainment Industry**

### **Objectives/ guidelines:**

*   Segregate all data region-wise
*   Determining the worst-hit countries and regions
*   Observing the spike in subscriptions to OTTs like **Netflix, Amazon Prime Video, Disney+ Hotstar**, etc.
*   Drawing insights into daily usage of devices connected to the Internet
*   Briefly comprehending the significance of television and social media as a source of entertainment
*   Accounting for the rise in online gaming and e-sports, and visualizing the impact on services like **Twitch**, **Steam** and **Discord**.
*   Consequently, understanding the increase in data usage.






## **Data Sources and Bibliography**

# **Part 1:**

### **Cleaning data and tracing the spread of CoVID-19 to determine the worst-hit countries**

After scraping websites on the Internet and reputed data warehouses, data in the form of **comma seperated values (CSV) files** have been obtained. 

This data needs to be cleaned and organised, so that powerful functions from **Python's pandas and sklearn libraries** can be used for fruitful data analysis.

Data from CSV files will be stored in pandas **DataFrames**; a relational data management structure that stores data in rows and columns for easy, linear access.

In [0]:
import pandas as pd

latitudes = pd.read_csv('latitudes.csv')
covid_data = pd.read_csv('covid_19_data.csv')

In [2]:
latitudes_countries = latitudes.iloc[:, :4]
latitudes_countries.head()

Unnamed: 0,country_code,latitude,longitude,country
0,AD,42.546245,1.601554,Andorra
1,AE,23.424076,53.847818,United Arab Emirates
2,AF,33.93911,67.709953,Afghanistan
3,AG,17.060816,-61.796428,Antigua and Barbuda
4,AI,18.220554,-63.068615,Anguilla


In [3]:
covid_data.head()

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,01/22/2020,Anhui,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
1,2,01/22/2020,Beijing,Mainland China,1/22/2020 17:00,14.0,0.0,0.0
2,3,01/22/2020,Chongqing,Mainland China,1/22/2020 17:00,6.0,0.0,0.0
3,4,01/22/2020,Fujian,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
4,5,01/22/2020,Gansu,Mainland China,1/22/2020 17:00,0.0,0.0,0.0


**Finding all countries listed in the Country/ Region column of CoVID Dataset:**

In [4]:
covid_data['Country/Region'].unique()   #Returns all unique labels contained in the column
covid_data['Country/Region'].nunique()   #Returns number of unique labels contained in the column

219

**Similarly, finding all countries listed under the 'Country' column of the latitudes dataset**

In [5]:
latitudes_countries['country'].unique()
latitudes_countries['country'].nunique()

245

# **We Observe:**
- China is called "Mainland China" in the CoVID-19 Dataset
- The number of countries in the latitude dataset is more; the countries present in both datasets need to be replaced with their corresponding latitudes

Hence, we need to:
- Check for extra countries in either dataset
- Merge those countries with bigger countries manually (ex: if one dataset calls it US while the other calls it United States, both need to be merged into one label and so on)
- Discard redundant countries/ countries with zero cases

In [6]:
latitudes_countries['country'].unique()

array(['Andorra', 'United Arab Emirates', 'Afghanistan',
       'Antigua and Barbuda', 'Anguilla', 'Albania', 'Armenia',
       'Netherlands Antilles', 'Angola', 'Antarctica', 'Argentina',
       'American Samoa', 'Austria', 'Australia', 'Aruba', 'Azerbaijan',
       'Bosnia and Herzegovina', 'Barbados', 'Bangladesh', 'Belgium',
       'Burkina Faso', 'Bulgaria', 'Bahrain', 'Burundi', 'Benin',
       'Bermuda', 'Brunei', 'Bolivia', 'Brazil', 'Bahamas', 'Bhutan',
       'Bouvet Island', 'Botswana', 'Belarus', 'Belize', 'Canada',
       'Cocos [Keeling] Islands', 'Congo [DRC]',
       'Central African Republic', 'Congo [Republic]', 'Switzerland',
       "Côte d'Ivoire", 'Cook Islands', 'Chile', 'Cameroon', 'China',
       'Colombia', 'Costa Rica', 'Cuba', 'Cape Verde', 'Christmas Island',
       'Cyprus', 'Czech Republic', 'Germany', 'Djibouti', 'Denmark',
       'Dominica', 'Dominican Republic', 'Algeria', 'Ecuador', 'Estonia',
       'Egypt', 'Western Sahara', 'Eritrea', 'Spain', 'Ethi

In [7]:
covid_data['Country/Region'].unique()

array(['Mainland China', 'Hong Kong', 'Macau', 'Taiwan', 'US', 'Japan',
       'Thailand', 'South Korea', 'Singapore', 'Philippines', 'Malaysia',
       'Vietnam', 'Australia', 'Mexico', 'Brazil', 'Colombia', 'France',
       'Nepal', 'Canada', 'Cambodia', 'Sri Lanka', 'Ivory Coast',
       'Germany', 'Finland', 'United Arab Emirates', 'India', 'Italy',
       'UK', 'Russia', 'Sweden', 'Spain', 'Belgium', 'Others', 'Egypt',
       'Iran', 'Israel', 'Lebanon', 'Iraq', 'Oman', 'Afghanistan',
       'Bahrain', 'Kuwait', 'Austria', 'Algeria', 'Croatia',
       'Switzerland', 'Pakistan', 'Georgia', 'Greece', 'North Macedonia',
       'Norway', 'Romania', 'Denmark', 'Estonia', 'Netherlands',
       'San Marino', ' Azerbaijan', 'Belarus', 'Iceland', 'Lithuania',
       'New Zealand', 'Nigeria', 'North Ireland', 'Ireland', 'Luxembourg',
       'Monaco', 'Qatar', 'Ecuador', 'Azerbaijan', 'Czech Republic',
       'Armenia', 'Dominican Republic', 'Indonesia', 'Portugal',
       'Andorra', 'Latvia

In [0]:
#Define a function to seperate extra countries in both datasets
import numpy as np

def extra_countries(c1, c2):
  extra_countries = []
  for i in c1:
    flag = 0
    for j in c2:
      if (i == j):
        flag = 1
    if (flag == 0):
      extra_countries.append(i)
  
  return np.array(extra_countries)

In [9]:
#Finding countries that are in latitudes.csv but not in covid_19_data.csv

extra_countries(latitudes_countries['country'].unique(), covid_data['Country/Region'].unique())

array(['Anguilla', 'Netherlands Antilles', 'Antarctica', 'American Samoa',
       'Bermuda', 'Bouvet Island', 'Cocos [Keeling] Islands',
       'Congo [DRC]', 'Congo [Republic]', "Côte d'Ivoire", 'Cook Islands',
       'China', 'Christmas Island', 'Falkland Islands [Islas Malvinas]',
       'Micronesia', 'United Kingdom',
       'South Georgia and the South Sandwich Islands', 'Gaza Strip',
       'Heard Island and McDonald Islands', 'Isle of Man',
       'British Indian Ocean Territory', 'Kiribati', 'Comoros',
       'North Korea', 'Lesotho', 'Marshall Islands', 'Macedonia [FYROM]',
       'Myanmar [Burma]', 'Northern Mariana Islands', 'Montserrat',
       'New Caledonia', 'Norfolk Island', 'Nauru', 'Niue',
       'French Polynesia', 'Saint Pierre and Miquelon',
       'Pitcairn Islands', 'Palestinian Territories', 'Palau', 'Réunion',
       'Solomon Islands', 'Saint Helena', 'Svalbard and Jan Mayen',
       'São Tomé and Príncipe', 'Swaziland', 'Turks and Caicos Islands',
       'Fren

In [10]:
#Finding countries that are in covid_19_data.csv but not in latitudes.csv

extra_countries(covid_data['Country/Region'].unique(),latitudes_countries['country'].unique())

array(['Mainland China', 'US', 'Ivory Coast', 'UK', 'Others',
       'North Macedonia', ' Azerbaijan', 'North Ireland',
       'Saint Barthelemy', 'Palestine', 'Republic of Ireland',
       'St. Martin', 'occupied Palestinian territory', "('St. Martin',)",
       'Channel Islands', 'Holy See', 'Congo (Kinshasa)', 'Reunion',
       'Curacao', 'Eswatini', 'Congo (Brazzaville)',
       'Republic of the Congo', 'The Bahamas', 'The Gambia',
       'Gambia, The', 'Bahamas, The', 'Cabo Verde', 'East Timor',
       'Diamond Princess', 'West Bank and Gaza', 'Burma', 'MS Zaandam',
       'South Sudan', 'Sao Tome and Principe'], dtype='<U30')

**Replace country names in CoVID-19 dataset with matching names from Latitudes dataset**

In [0]:
corrected_covid_data = covid_data.replace(['Burma','Congo (Brazzaville)', 'Congo (Kinshasa)',
       "Cote d'Ivoire", 'Czechia', 'Diamond Princess',
       'Korea, South', 'North Macedonia',
       'South Sudan', 'Taiwan*', 'US',
       'West Bank and Gaza'], ['Myanmar [Burma]', 'Congo [DRC]', 'Congo [DRC]', "Côte d'Ivoire", 'Czech Republic',
                               'Japan', 'South Korea',
                               'Macedonia [FYROM]', 'Sudan', 'Taiwan', 'United States', 'Gaza Strip'])
countries_to_drop = ['Holy See','Eswatini', 'Cabo Verde','MS Zaandam',
                     'Sao Tome and Principe']

In [12]:
corrected_covid_data.head()

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,01/22/2020,Anhui,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
1,2,01/22/2020,Beijing,Mainland China,1/22/2020 17:00,14.0,0.0,0.0
2,3,01/22/2020,Chongqing,Mainland China,1/22/2020 17:00,6.0,0.0,0.0
3,4,01/22/2020,Fujian,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
4,5,01/22/2020,Gansu,Mainland China,1/22/2020 17:00,0.0,0.0,0.0


In [13]:
# Finding countries in corrected_covid_data but not in latitudes

extras = extra_countries(corrected_covid_data['Country/Region'].unique(), latitudes_countries['country'].unique())
print(extras)

['Mainland China' 'Ivory Coast' 'UK' 'Others' ' Azerbaijan'
 'North Ireland' 'Saint Barthelemy' 'Palestine' 'Republic of Ireland'
 'St. Martin' 'occupied Palestinian territory' "('St. Martin',)"
 'Channel Islands' 'Holy See' 'Reunion' 'Curacao' 'Eswatini'
 'Republic of the Congo' 'The Bahamas' 'The Gambia' 'Gambia, The'
 'Bahamas, The' 'Cabo Verde' 'East Timor' 'MS Zaandam'
 'Sao Tome and Principe']


In [0]:
# Removing the countries whose latitude data is unavailable

for i in extras:
  corrected_covid_data = corrected_covid_data[corrected_covid_data['Country/Region'] != i]

**Enter code to replace the Country/ Region column with the latitude and longitude columns**

Also, drop unnecessary columns

In [15]:
latitudes_countries.head()

Unnamed: 0,country_code,latitude,longitude,country
0,AD,42.546245,1.601554,Andorra
1,AE,23.424076,53.847818,United Arab Emirates
2,AF,33.93911,67.709953,Afghanistan
3,AG,17.060816,-61.796428,Antigua and Barbuda
4,AI,18.220554,-63.068615,Anguilla


In [0]:
req_lat_long = latitudes_countries.loc[:,['country','latitude', 'longitude']]
req_lat_long = req_lat_long.set_index('country')

In [0]:
latitude_cols = []
longitude_cols = []

for i in corrected_covid_data['Country/Region']:
  latitude = req_lat_long.loc[i, 'latitude']
  longitude = req_lat_long.loc[i, 'longitude']
  latitude_cols.append(latitude)
  longitude_cols.append(longitude)

latitude_cols = np.array(latitude_cols)
longitude_cols = np.array(longitude_cols)

In [0]:
corrected_covid_data['Latitude'] = latitude_cols
corrected_covid_data['Longitude'] = longitude_cols

In [19]:
corrected_covid_data = corrected_covid_data.drop(['Lat','Long'], axis = 1)
corrected_covid_data.head()       #Obtained latitude and longitude for all countries within the dataframe

KeyError: ignored

In [0]:
corrected_covid_data = corrected_covid_data.drop(['Province/State'], axis = 1)
corrected_covid_data.head()

In [0]:
#Obtain a list of all affected countries
def most_affected(X):
  country_cases = {}
  countries = X.index.unique()
  for i in countries:
    cases =  np.sum(X.loc[[i],['Confirmed']])
    country_cases[i] = cases

  country_cases_df = pd.DataFrame(country_cases)
  country_cases_df = country_cases_df.transpose()
  affected_countries = []
  for i in range(8):
    country = country_cases_df[['Confirmed']].idxmax()['Confirmed']
    affected_countries.append(country)
    country_cases_df = country_cases_df.drop([country], axis = 0)

  return affected_countries

In [0]:
corrected_covid_data['Date'] =  pd.to_datetime(corrected_covid_data['Date'],
                              format='%Y-%m-%d')

In [0]:
temperature = pd.read_csv('temperature_dataframe.csv')
temperature.head()

In [0]:
humidity = (temperature['humidity'].groupby(temperature['country']))
temp = temperature['tempC'].groupby(temperature['country'])

In [0]:
temp_corrected_data = temperature.replace(['Barbuda', 'Aruba', "Cote d'Ivoire", 'French Guiana',
       'Greenland', 'Guadeloupe', 'Korea',
       'Macedonia', 'Congo', 'USA', 'UAE', 'UK'], ['Antigua and Barbuda', 'Netherlands', "Côte d'Ivoire", 'France',
                                        'Denmark', 'France', 'South Korea', 'North Macedonia', 'Congo (Kinshasa)', 'United States',
                                        'United Arab Emirates', 'United Kingdom'])
countries_to_drop = ['Eswatini', 'Guam', 'Guernsey', 'Jersey', 'Martinique', 'Mayotte', 'Puerto Rico', 'Reunion']

In [0]:
#import statistics
avg_stuff = pd.DataFrame()
avg_stuff['Avg_temp'] = (temp_corrected_data['tempC'].groupby(temp_corrected_data['country'])).mean()
avg_stuff['Avg_humidity'] = (temp_corrected_data['humidity'].groupby(temp_corrected_data['country'])).mean()

In [0]:
temp = []
humidity = []

for i in corrected_covid_data['Country/Region']:
  try:
    temperature = avg_stuff.loc[i, 'Avg_temp']
    humid = avg_stuff.loc[i, 'Avg_humidity']
    temp.append(temperature)
    humidity.append(humid)
  except KeyError:
    temp.append(None)
    humidity.append(None)
temp = np.array(temp)
humidity = np.array(humidity)

In [0]:
corrected_covid_data['Average Temperature'] = temp
corrected_covid_data['Average Humidity'] = humidity

In [0]:
corrected_covid_data

In [0]:
corrected_covid_data.set_index(['Country/Region'])

In [0]:
corrected_covid_data.reset_index(drop = True, inplace = True)

In [0]:
corrected_covid_data.set_index('Country/Region', inplace = True)

In [0]:
df_most_affected = corrected_covid_data
affected = most_affected(df_most_affected)

In [0]:
# Print the 8 most affected countries

print(affected)

**Plot for Cases/day vs Dates**

In [0]:
# Cases vs dates
import matplotlib.pyplot as plt
%matplotlib inline

cases_day = df_most_affected
plt.figure(figsize = [20,10])
for i in range(4):
  j = df_most_affected.loc[affected[i], ['Confirmed','Date']]
  j.reset_index(drop = True, inplace = True)
  j = j.groupby(["Date"]).sum()
  plt.plot(j.index, j['Confirmed'], label = str(affected[i]))

plt.xlabel('Dates (20th Jan - 20th Apr')
plt.ylabel('Confirmed Cases')
plt.title('Trend of Total Confirmed Cases in 4 most affected countries')
plt.legend()
plt.grid()

In [0]:
def cases_per_day(X):
  cases_day = []
  cases = 0
  for i in range(len(X.index) - 1):
    cases = X.iloc[(i + 1), 0] - X.iloc[i, 0]
    cases_day.append(cases)
  cases = X.iloc[(len(X.index) - 1), 0] - X.iloc[(len(X.index) - 2), 0]
  cases_day.append(cases)
  
  X['Cases per Day'] = np.array(cases_day)

  return X

In [0]:
# Cases per Day vs Dates

%matplotlib inline
plt.figure(figsize = [20, 10])

for i in range(4):
  j = df_most_affected.loc[affected[i], ['Confirmed','Date']]
  j.reset_index(drop = True, inplace = True)
  j = j.groupby(["Date"]).sum()
  j = cases_per_day(j)
  plt.plot(j.index, j['Cases per Day'], label = affected[i])

plt.xlabel('Dates (20th Jan - 20th Apr)')
plt.ylabel('Confirmed Cases per Day')
plt.title('Rate of Increase in Confirmed Cases in 4 most affected countries')
plt.legend()
plt.grid()

## **Part 1 Inference:**

* **United States of America (USA), Italy, Spain** and **China** are among the worst hit countries in the pandemic.
* China has managed to **flatten the curve** and prevent additional rise in the number of cases by the beginning of **March 2020**.
* United States of America has the highest rate of increase in the number of cases. Italy and Spain have managed to curb the spread of the virus and are witnessing a decrease in the rate of new cases.