
# Analysis of COVID-19 data in Algeria and Comparison: Algeria & Arab countries


## Introduction

The COVID-19 pandemic in Algeria is part of the worldwide pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was confirmed to have spread to Algeria in February 2020.

The project on Analysis of COVID-19 data in Algeria and Comparison; confirmed cases and deaths, daily and monthly statistics from January to September 2020, analyzed in Algeria, its provinces, and then compared with Arab countries. As follows:

* Data Preparation and Cleaning
* Exploratory Analysis and Visualization
* Asking and Answering Questions
  - Part (I): Algeria & its Provinces
  - Part (II): Algeria & Arab countries
* Inferences and Conclusion
* References and Future Work
* Call

In this project, I will try to use most of what I have learned in this great course [Data Analysis with Python: Zero to Pandas](http://zerotopandas.com) to the analysis of COVID-19 data in my nice country Algeria and Comparised with Arab countries. To come out with results that may benefit Algeria and the world in the future.

# Project Title

In [None]:
project_name = "Analysis of COVID-19 data in Algeria and Comparison: Algeria & Arab countries" 

In [None]:
!pip install jovian --upgrade -q

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)

# Data Preparation and Cleaning

### Importing  libraries

* pandas
* matplotlib
* seaborn
* numpy
* jovian

In [None]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import numpy as np

### Configuring styles

In [None]:
sns.set_style("darkgrid")
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

### Algeria location in Africa; map

In [None]:
from PIL import Image
img = Image.open('algeria_north_africa.jpg') 
plt.grid(False)
plt.title('Algeria')
plt.axis('off')
plt.imshow(img);
# Source: https://www.mapsland.com/africa/algeria/detailed-location-map-of-algeria

### Reading a file about daily Covid-19 data in Algeria using Pandas

In [None]:
covid_dz = pd.read_csv('algeria-covid-data.csv')
covid_dz

By looking at the data frame:

- The file provides three daywise counts for Covid-19 in Algeria.
- There are: date, new_cases, new_deaths. 
- Data is provided for 275 days: from Dec 31, 2019 to Sep 30, 2020.

### View some basic information about the data frame

In [None]:
covid_dz.info()

It appears that each column contains values of a specific data type.

 ### View some statistical information

In [None]:
covid_dz.describe()

In new deaths data, we can see that the mean value is 6, standard deviation 5, minimum 0 & the maximum value is 42.

### List of columns

In [None]:
covid_dz.columns.tolist()

### Number of days (dates) in the data frame 

In [None]:
covid_dz.shape[0]
print('There are {} days in the dataset'.format(covid_dz.shape[0]))

### A random sample of rows from the data frame

In [None]:
covid_dz.sample(10)

### List of dates

In [None]:
covid_dz['date']

There are 275 days: from Dec 31, 2019 to Sep 30, 2020

### List of New cases

In [None]:
covid_dz.new_cases

### List of New deaths

In [None]:
covid_dz.new_deaths

### Number of missing new cases

In [None]:
new_cases_missing = covid_dz.new_cases.isna().sum()
print('There are {} missing new cases in the dataset'.format(new_cases_missing))

### Number of missing new deaths

In [None]:
new_deaths_missing = covid_dz.new_deaths.isna().sum()
print('There are {} missing new deaths in the dataset'.format(new_deaths_missing))

### Compare the new cases vs. new deaths

In [None]:
compare_dz = covid_dz[['new_cases','new_deaths']]
compare_dz

### Date of rise first case

In [None]:
covid_first_case = covid_dz.loc[55:60]
covid_first_case

 The first case was on Feb 26, 2020

### Date of first death

In [None]:
covid_first_death = covid_dz.loc[65:75]
covid_first_death 

The first death was on March 13, 2020

### Dates with the highest new cases

In [None]:
highest_cases = covid_dz.sort_values('new_cases', ascending=False).head(10)
highest_cases

* All dates with the highest new cases are in July
* July 25, 2020 was the day with the highest new cases at all

### Dates with the highest new deaths

In [None]:
highest_deaths = covid_dz.sort_values('new_deaths', ascending=False).head(10)
highest_deaths

* All dates with the highest new deaths are in April
* April 04, 2020 was the day with the highest new deaths at all

### Dates before & after April 04, 2020

In [None]:
covid_dz.loc[90:100]

The number of deaths was low, then suddenly increased to 42, and then gradually decreased

### Dates with the lowest new cases

In [None]:
lowest_cases = covid_dz[covid_dz.new_cases > 0].sort_values('new_cases').head(10)
lowest_cases

* Most dates with the lowest new cases are in March
* Feb 26, March 14, 2020 were the days with the lowest new cases at all

### Dates with the lowest new deaths

In [None]:
lowest_deaths = covid_dz[covid_dz.new_deaths > 0].sort_values('new_deaths').head(10)
lowest_deaths

* Most dates with the lowest new deaths are in March
* March 13, 14, 18, 19, 20, 28, 2020 were the days with the lowest new deaths at all

In [None]:
import jovian

In [None]:
jovian.commit()

# Exploratory Analysis and Visualization



### The number of total cases & total deaths in Algeria

In [None]:
total_cases = covid_dz.new_cases.sum()
total_deaths = covid_dz.new_deaths.sum()
print('The number of total cases is {} and the number of total deaths is {} in Algeria; Until September 30, 2020 '.format(int(total_cases), int(total_deaths)))

*Pie chart of the number of total cases & total deaths in Algeria*

In [None]:
labels = 'total cases', 'total deaths'
total = [51368, 1726]
explode = (0, 0) 

fig1, ax1 = plt.subplots()
ax1.pie(total, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)
ax1.axis('equal')  
ax1.set_title("COVID-19 Total Cases & Total Deaths in Algeria\n")

plt.show()

### The average of total cases & total deaths in Algeria

In [None]:
avrg_cases = covid_dz.new_cases.mean()
avrg_deaths = covid_dz.new_deaths.mean()
print('The average of total cases is {} and the average of total deaths is {} in Algeria.'.format(int(avrg_cases), int(avrg_deaths)))

### The overall death rate in Algeria

In [None]:
death_rate = covid_dz.new_deaths.sum() / covid_dz.new_cases.sum()
print("The overall death rate in Algeria is {:.2f} %.".format(death_rate*100))

### List of high new cases

In [None]:
high_new_cases = covid_dz.new_cases > 600
covid_dz[high_new_cases]

The highest new cases are greater than 600

 ### List of high new deaths

In [None]:
high_new_deaths = covid_dz.new_deaths > 20
covid_dz[high_new_deaths]

The highest new deaths are between 20 and 42

### Extract dates into separate columns: year, month, day & weekday

In [None]:
covid_dz['year'] = pd.DatetimeIndex(covid_dz.date).year
covid_dz['month'] = pd.DatetimeIndex(covid_dz.date).month
covid_dz['day'] = pd.DatetimeIndex(covid_dz.date).day
covid_dz['weekday'] = pd.DatetimeIndex(covid_dz.date).weekday # weekday: the day of the week with Monday=0, Sunday=6
covid_dz

###  The month with the highest number of new deaths: total cases & total deaths

In [None]:
print('July')
covid_dz[covid_dz.month == 7][['new_cases', 'new_deaths']].sum()

### Summarize the daywise data and create a new data frame with month-wise data

In [None]:
covid_month_dz = covid_dz.groupby('month')[['new_cases', 'new_deaths']].sum()
covid_month_dz

### The monthly averages

In [None]:
covid_month_avrg_dz = covid_dz.groupby('month')[['new_cases', 'new_deaths']].mean()
covid_month_avrg_dz

### Calculate the cumulative sum of cases & deaths. Let's add 2 new columns: `total_cases`, `total_deaths`

In [None]:
covid_dz['total_cases'] = covid_dz.new_cases.cumsum()
covid_dz['total_deaths'] = covid_dz.new_deaths.cumsum()
covid_dz

### Plot the new cases per day 

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Daily New Cases in Algeria')
covid_dz.new_cases.plot(color='purple');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Daily New Cases in Algeria')
covid_dz.new_cases.plot(kind='area', color='purple');

* The number of cases was low, then gradually increased, then decreased slightly, then suddenly increased to reach it's maximum, then gradually decreased.
* We notice that the disease has two waves of spread:
  - The First: the speed of spread was medium
  - The second: the speed of spread was very fast


In [None]:
plt.figure(figsize=(12,6))
plt.title('New Cases Range in Algeria')
plt.xlabel('New Cases')
plt.ylabel('Values')
plt.hist(covid_dz.new_cases, bins=np.arange(1, 675, 10), color='orchid');

Most new cases are lying in the range of fewer than 200 cases per day

### Plot the new deaths per day 

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Daily New Deaths in Algeria')
covid_dz.new_deaths.plot(color='red');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Daily New Deaths in Algeria')
covid_dz.new_deaths.plot(kind='area', color='red');

The number of deaths was low, then it suddenly increased to its maximum, then began to gradually decrease, then began to fluctuate within a range of less than 15

In [None]:
plt.figure(figsize=(12,6))
plt.title('New Deaths Range in Algeria')
plt.xlabel('New Deaths')
plt.ylabel('Values')
plt.hist(covid_dz.new_deaths, bins=np.arange(1, 42, 1), color='hotpink');

Most new deaths are lying in the range of fewer than 15 deaths per day

### Compare the new cases vs. new deaths

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Daily New Cases & Deaths in Algeria')
covid_dz.new_cases.plot(color='purple')
covid_dz.new_deaths.plot(color='red');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Daily New Cases & Deaths in Algeria')
covid_dz.new_cases.plot(kind='area', color='purple')
covid_dz.new_deaths.plot(kind='area', color='red');

The number of deaths is very small compared to the very large number of cases

### compare the total cases vs. total deaths

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total cumulatively: Daily New Cases & New Deaths in Algeria')
covid_dz.total_cases.plot(color='purple')
covid_dz.total_deaths.plot(color='red');

The number of deaths is very small, over all days, which is a good sign.

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total cumulatively: Daily New Cases & New Deaths in Algeria')
covid_dz.total_cases.plot(kind='area', color='purple')
covid_dz.total_deaths.plot(kind='area',color='red');

That confirms that most of those who were infected have recovered

### Plot the new cases per month 

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Cases in Algeria')
covid_month_dz.new_cases.plot(color='blueviolet');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Cases in Algeria')
covid_month_dz.new_cases.plot(kind='area', color='blueviolet');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Cases in Algeria')
covid_month_dz.new_cases.plot(kind='barh', color='blueviolet' );

* The most cases are in months: July & August
* The most cases are in the summer season
* That is, Covid-19 spreads in the summer when the temperature rates are high, as well

### Plot the new deaths per month 

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Deaths in Algeria')
covid_month_dz.new_deaths.plot(color='orangered');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Deaths in Algeria')
covid_month_dz.new_deaths.plot(kind='area', color='orangered');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Deaths in Algeria')
covid_month_dz.new_deaths.plot(kind='barh', color='orangered');

* The most deaths are between April & September
* The most deaths are in the summer season
* That is, Covid-19 spreads in the summer as well

### Compare per month: the new cases vs. new deaths

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Cases & New Deaths in Algeria')
covid_month_dz.new_cases.plot(color='blueviolet')
covid_month_dz.new_deaths.plot(color='orangered');

The number of deaths is very small, over all months, which is a good sign.

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Monthly New Cases & New Deaths in Algeria')
covid_month_dz.new_cases.plot(kind='area', color='blueviolet')
covid_month_dz.new_deaths.plot(kind='area', color='orangered');

That confirms that most of those who were infected have recovered

### Python list showing months, new cases & new deaths

In [None]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']
new_cases = [1666, 1659, 2182, 3620, 5108, 4640, 12823, 9569, 5313]
new_deaths = [48, 54, 75, 238, 206, 229, 257, 239, 182]

### Line Chart: Monthly New Cases in Algeria

In [None]:
plt.figure(figsize=(12,6))
plt.plot(months, new_cases,'s--b')

plt.xlabel('Months')
plt.ylabel('New Cases')

plt.title("COVID-19 Monthly New Cases in Algeria")
plt.legend(['new cases']);

* The number of cases was low in Jan & Feb
* Gradually increased: Feb to May
* Decreased slightly: May to Jun
* Suddenly increased to reach it's maximum: Jun to Jul
* Gradually decreased: Jul to Sep
* The most cases are in months: July & August
* The most cases are in the summer season

### Line Chart: Monthly New Deaths in Algeria

In [None]:
plt.figure(figsize=(12,6))
plt.plot(months, new_deaths, 'o-r')

plt.xlabel('Months')
plt.ylabel('New Deaths')

plt.title("COVID-19 Monthly New Deaths in Algeria")
plt.legend(['new deaths']);

* The number of deaths was low in Jan & Feb
* Gradually increased: Feb to Mar
* Suddenly increased: Mar to Apr
* Decreased: Apr to May
* Increased to reach it's maximum: May to Jul
* Gradually decreased: Jul to Sep
* The most deaths are between April & September
* The most deaths are in the summer season

### Line Chart: Monthly New Cases & New Deaths in Algeria

In [None]:
plt.figure(figsize=(12,6))
plt.plot(months, new_cases, 's--b')
plt.plot(months, new_deaths, 'o-r')

plt.xlabel('Months')
plt.ylabel('Values')

plt.title("COVID-19 Monthly New Cases & New Deaths in Algeria")
plt.legend(['new cases', 'new deaths']);

The number of deaths is very small compared to the number of cases, over all months, which is a good sign.

###  Two Bar Charts: Monthly New Cases & New Deaths in Algeria

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(12, 6), sharey=True)
axs[0].bar(months, new_cases,color='blueviolet')
axs[1].bar(months, new_deaths, color='orangered')
fig.suptitle('COVID-19 Monthly New Cases & New Deaths in Algeria');

This explains that most of those infected have recovered

### Some selected  Bar  & Pie Charts: Monthly New Cases & New Deaths in Algeria

In [None]:
plt.figure(figsize=(12,6))
plt.title("COVID-19 Monthly New Cases & New Deaths in Algeria")
plt.bar(months, new_cases,color='blueviolet')
plt.bar(months, new_deaths,color='orangered')
plt.legend(['new cases', 'new deaths']);

The number of deaths is very small, over all months, which is a good sign.

In [None]:

data = ((1666, 48), (1659, 54), (2182, 75), (3620, 238), (5108, 206), (4640, 229), (12823, 257), (9569, 239), (5313, 182))

dim = len(data[0])
w = 0.75
dimw = w / dim

fig, ax = plt.subplots()
x = np.arange(len(data))
for i in range(len(data[0])):
    y = [d[i] for d in data]
    b = ax.bar(x + i * dimw, y, dimw, bottom=5)

ax.set_xticks(x + dimw / 2)
ax.set_xticklabels(map(str, x))
ax.set_yscale('log')

ax.set_xlabel('Months')
ax.set_ylabel('Values')
ax.legend(['new cases', 'new deaths'])
ax.set_title("COVID-19 Monthly New Cases & New Deaths in Algeria")

plt.show()

The number of deaths, over all months, is very clear here. We can see that it is greater than 100 from Apr to Sep

In [None]:

labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']
new_cases = [1666, 1659, 2182, 3620, 5108, 4640, 12823, 9569, 5313]
new_deaths = [48, 54, 75, 238, 206, 229, 257, 239, 182]

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, new_cases, width, label='new cases', color='blueviolet')
rects2 = ax.bar(x + width/2, new_deaths, width, label='new deaths', color='orangered')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Values')
ax.set_title('COVID-19 Monthly New Cases & New Deaths in Algeria\n')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()


def autolabel(rects):
    """Attach a text label above each bar in *rects*, displaying its height."""
    for rect in rects:
        height = rect.get_height()
        ax.annotate('{}'.format(height),
                    xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom')

autolabel(rects1)
autolabel(rects2)

fig.tight_layout()

plt.show()

The number of all values, over all months, is very clear here. We can see that the greatest new cases were in Jul: 12823 & the greatest new deaths were: 257 in the same month

### Pie Charts: Monthly New Cases in Algeria

In [None]:
# new cases
labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']
case_sizes = [1666, 1659, 2182, 3620, 5108, 4640, 12823, 9569, 5313]
explode = (0.2, 0, 0.2, 0, 0.2, 0, 0.2, 0, 0) 

fig1, ax1 = plt.subplots()
ax1.pie(case_sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  
ax1.set_title("COVID-19 Monthly New Cases in Algeria\n")

plt.show()

In [None]:
labels =['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']
case_sizes = [1666, 1659, 2182, 3620, 5108, 4640, 12823, 9569, 5313]

fig1, ax1 = plt.subplots()
pie1=plt.pie(case_sizes, labels=labels, autopct='%1.1f%%', radius=2)
pie2=plt.pie([5], colors ='w', radius=1)

ax1.axis('equal')  
ax1.set_title("COVID-19 Monthly New Cases in Algeria\n")
  
plt.show()

The comparison between Monthly New Cases is very easy here. July is the first.

### Pie Charts: Monthly New Deaths in Algeria

In [None]:
# new deaths
labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']
death_sizes = [48, 54, 75, 238, 206, 229, 257, 239, 182]
explode = (0.2, 0, 0.2, 0, 0.2, 0, 0.2, 0, 0) 

fig1, ax1 = plt.subplots()
ax1.pie(death_sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  
ax1.set_title("COVID-19 Monthly New Deaths in Algeria\n")

plt.show()

In [None]:
labels =['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']
death_sizes = [48, 54, 75, 238, 206, 229, 257, 239, 182]

fig1, ax1 = plt.subplots()
pie1=plt.pie(death_sizes, labels=labels, autopct='%1.1f%%', radius=2)
pie2=plt.pie([5], colors ='w', radius=1)

ax1.axis('equal')  
ax1.set_title("COVID-19 Monthly New Deaths in Algeria\n")
  
plt.show()

The comparison between Monthly New Deaths is very easy here. July is the first.

### Some selected charts

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(20, 10))

axes[0,0].set_title('Algeria')
axes[0,0].imshow(img)
axes[0,0].grid(False)
axes[0,0].set_xticks([])
axes[0,0].set_yticks([])

axes[0,1].plot(months, new_cases,'s--b')
axes[0,1].set_xlabel('Months')
axes[0,1].set_ylabel('New Cases')
axes[0,1].set_title("COVID-19 Monthly New Cases in Algeria")
axes[0,1].legend(['new cases']);

axes[0,2].plot(months, new_deaths, 'o-r')
axes[0,2].set_xlabel('Months')
axes[0,2].set_ylabel('New Deaths')
axes[0,2].set_title("COVID-19 Monthly New Deaths in Algeria")
axes[0,2].legend(['new deaths']);

axes[0,3].plot(months, new_cases, 's--b')
axes[0,3].plot(months, new_deaths, 'o-r')
axes[0,3].set_xlabel('Months')
axes[0,3].set_ylabel('Values')
axes[0,3].set_title("Monthly New Cases & New Deaths in Algeria")
axes[0,3].legend(['new cases', 'new deaths']);

axes[1,0].set_title('New Cases Range in Algeria')
axes[1,0].set_xlabel('New Cases')
axes[1,0].set_ylabel('Values')
axes[1,0].hist(covid_dz.new_cases, bins=np.arange(1, 675, 10), color='orchid');

axes[1,1].set_title('New Deaths Range in Algeria')
axes[1,1].set_xlabel('New Deaths')
axes[1,1].set_ylabel('Values')
axes[1,1].hist(covid_dz.new_deaths, bins=np.arange(1, 42, 1), color='hotpink');

axes[1,2].set_title("COVID-19 Monthly New Cases")
axes[1,2].bar(months, new_cases, color='blueviolet')
axes[1,2].legend(['new cases', 'new deaths']);

axes[1,3].set_title("COVID-19 Monthly New Cases & New Deaths in Algeria")
axes[1,3].bar(months, new_cases,color='blueviolet' )
axes[1,3].bar(months, new_deaths, color='orangered')
axes[1,3].legend(['new cases', 'new deaths']);

plt.tight_layout(pad=2);

In [None]:
import jovian

In [None]:
jovian.commit()

# Asking and Answering Questions



## Part (I): Algeria & its Provinces

### Q 1): Is the number of cases & deaths per million people high or low?

*Reading a file about world countries using Pandas*

In [None]:
countries_df = pd.read_csv('countries.csv')
countries_df

*An overview of Algeria data*

In [None]:
countries_df[countries_df.location == "Algeria"]

*Marge this data into our existing data frame about Algeria by adding the column of 'location'*

In [None]:
covid_dz['location'] = "Algeria"
covid_dz

*Add the columns from `countries_df` into `covid_dz`*

In [None]:
merged_df = covid_dz.merge(countries_df, on="location")
merged_df

*Calculate new cases & new deaths per million*

In [None]:
merged_df['new_cases_per_million'] = merged_df.new_cases * 1e6 / merged_df.population
merged_df['new_deaths_per_million'] = merged_df.new_deaths * 1e6 / merged_df.population
merged_df

*Highest new cases per million*

In [None]:
highest_cases_per_millin_df = merged_df.sort_values('new_cases_per_million', ascending=False).head(10)
highest_cases_per_millin_df

*Highest new deaths per million*

In [None]:
highest_deaths_per_millin_df = merged_df.sort_values('new_deaths_per_million', ascending=False).head(10)
highest_deaths_per_millin_df

* **The number of new cases per million people in Algeria is low: less than 16** 
* **The number of new deaths per million people in Algeria is very low: less than 1**

### Q 2): Has Covid-19 spread throughout all Algeria or only in some regions?

*Reading a file about Algeria provinces data using Pandas*

In [None]:
provinces_df = pd.read_csv('alg-prov-covid.csv') # Latest update: October 02, 2020
provinces_df

*Number of provinces*

In [None]:
provinces_df.shape[0]
print('There are {} provinces in Algeria'.format(provinces_df.shape[0]))

*Plotting provinces data*

In [None]:
plt.figure(figsize=(16,8))
plt.xticks(rotation=90)
plt.title('COVID-19 Total Cases in Algeria Provinces')
sns.barplot(provinces_df.province, provinces_df.total_cases);

In [None]:
plt.figure(figsize=(16,8))
plt.title('COVID-19 Total Cases in Algeria Provinces')
provinces_df.total_cases.plot(kind='bar', color='darkviolet');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total Cases in Algeria Provinces')
provinces_df.total_cases.plot(kind='area', color='darkviolet');

In [None]:
plt.figure(figsize=(16,8))
plt.xticks(rotation=90)
plt.title('COVID-19 Total Deaths in Algeria Provinces')
sns.barplot(provinces_df.province, provinces_df.total_deaths);

In [None]:
plt.figure(figsize=(16,8))
plt.title('COVID-19 Total Deaths in Algeria Provinces')
provinces_df.total_deaths.plot(kind='bar', color='tomato');

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total Deaths in Algeria Provinces')
provinces_df.total_deaths.plot(kind='area', color='tomato');

In [None]:
plt.figure(figsize=(12,12))
plt.title('COVID-19 Total Cases in Algeria Provinces')
provinces_df.total_cases.plot(kind='pie');

In [None]:
plt.figure(figsize=(12,12))
plt.title('COVID-19 Total Deaths in Algeria Provinces')
provinces_df.total_deaths.plot(kind='pie');

* **Covid-19 spread throughout all Algeria**
* **Cases appeared in all provinces**

### Q 3): Which province did Covid-19 first appear in?

In [None]:
first_df = provinces_df[['province','first_case']]
first_df

* **Blida is the Algerian province where the first confirmed case of COVID-19 infection appeared on March 1, 2020**
* **Note: On February 26, 2020, The confirmed case was of a foreign worker coming to Algeria from a foreign country.**

### Q 4): Which provinces have the highest number of cases & which have the lowest number of cases?

**Provinces have the highest number of cases**

In [None]:
highest_prov_cases = provinces_df.sort_values('total_cases', ascending=False).head(10)
highest_prov_cases

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('COVID-19, Provinces  with the Highest Cases')
sns.barplot(highest_prov_cases.province, highest_prov_cases.total_cases);

In [None]:
plt.title('COVID-19, Provinces  with the Highest Cases')
highest_prov_cases.total_cases.plot(kind='barh',color='darkviolet');

**Provinces have the lowest number of cases**

In [None]:
lowest_prov_cases = provinces_df.sort_values('total_cases').head(10)
lowest_prov_cases

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('COVID-19, Provinces with the Lowest Cases')
sns.barplot(lowest_prov_cases.province, lowest_prov_cases.total_cases);

In [None]:
plt.title('COVID-19, Provinces  with the Lowest Cases')
lowest_prov_cases.total_cases.plot(kind='barh',color='darkviolet');

### Q 5): Which provinces have the highest number of deaths & which have the lowest number of deaths?

**Provinces have the highest number of deaths**

In [None]:
highest_prov_deaths = provinces_df.sort_values('total_deaths', ascending=False).head(10)
highest_prov_deaths

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('COVID-19, Provinces with the Highest Deaths')
sns.barplot(highest_prov_deaths.province, highest_prov_deaths.total_deaths);

In [None]:
plt.title('COVID-19, Provinces with the Highest Deaths')
highest_prov_deaths.total_deaths.plot(kind='barh', color='tomato');

**Provinces have the lowest number of deaths**

In [None]:
lowest_prov_deaths = provinces_df.sort_values('total_deaths').head(10)
lowest_prov_deaths

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('COVID-19, Provinces with the Lowest Deaths')
sns.barplot(lowest_prov_deaths.province, lowest_prov_deaths.total_deaths);

No deaths in both of: Saïda & Illizi.

In [None]:
plt.title('COVID-19, Provinces with the Lowest Deaths')
lowest_prov_deaths.total_deaths.plot(kind='barh', color='tomato');

## Part (II): Algeria & Arab countries

In [None]:
img2 = Image.open('arabic-countries-map.jpg')
plt.grid(False)
plt.title('Arab Countries')
plt.axis('off')
plt.imshow(img2);
# Source: https://www.worldatlas.com/articles/arabic-speaking-countries.html

### Q 6): Has Covid-19 spread across all Arab countries or only in Algeria?

*Reading a file about Arab countries data using Pandas*

In [None]:
covid_arabic_countries_df = pd.read_csv('covid-arabic-coun-data.csv') # Latest update: Sep 18, 2020
covid_arabic_countries_df

*Arab countries: Total cases*

In [None]:
plt.figure(figsize=(16,8))
plt.xticks(rotation=75)
plt.title('COVID-19 Total Cases in Arab Countries')
sns.barplot(covid_arabic_countries_df.location, covid_arabic_countries_df.total_cases);

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total Cases in Arab Countries')
covid_arabic_countries_df.total_cases.plot(kind='bar', color='darkviolet');

In [None]:
plt.figure(figsize=(12,12))
plt.title('COVID-19 Total Cases in Arab Countries')
covid_arabic_countries_df.total_cases.plot(kind='pie');

*Arab countries: Total deaths*

In [None]:
plt.figure(figsize=(16,8))
plt.xticks(rotation=75)
plt.title('COVID-19 Total Deaths in Arab Countries')
sns.barplot(covid_arabic_countries_df.location, covid_arabic_countries_df.total_deaths);

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total Deaths in Arab Countries')
covid_arabic_countries_df.total_deaths.plot(kind='bar',color='red');

In [None]:
plt.figure(figsize=(12,12))
plt.title('COVID-19 Total Deaths in Arab Countries')
covid_arabic_countries_df.total_deaths.plot(kind='pie');

*Arab countries: Total cases & Total deaths*

In [None]:
plt.figure(figsize=(12,6))
plt.title('COVID-19 Total Cases & Deaths in Arab Countries')
covid_arabic_countries_df.total_cases.plot(kind='area', color='darkviolet')
covid_arabic_countries_df.total_deaths.plot(kind='area',color='red');

*Arab countries: An overview*

In [None]:
combined_df = covid_arabic_countries_df.merge(countries_df, on='location')
combined_df

*Arab countries: Sum of countries in each continent*

In [None]:
country_counts_df = combined_df.groupby('continent')[['location']].count()
country_counts_df

In [None]:
plt.title('Arab countries: Sum of countries in each continent')
country_counts_df.location.plot(kind='pie');

*Arab countries: Population*

In [None]:
combined_df.population.sum()
print('The total number of the population in the Arab countries: {} '.format(combined_df.population.sum()))

*Arab countries: Sum of population in each continent*

In [None]:
continent_populations_df = combined_df.groupby('continent')[['population']].sum()
continent_populations_df

In [None]:
plt.title('Arab countries: Sum of population in each continent')
continent_populations_df.population.plot(kind='pie');

*Arab countries: Total cases in each continent*

In [None]:
continent_cases_df = combined_df.groupby('continent')[['total_cases']].sum()
continent_cases_df

In [None]:
plt.title('Arab countries: Total cases in each continent')
continent_cases_df.total_cases.plot(kind='pie');

*Arab countries: Total deaths in each continent*

In [None]:
continent_deaths_df = combined_df.groupby('continent')[['total_deaths']].sum()
continent_deaths_df

In [None]:
plt.title('Arab countries: Total deaths in each continent')
continent_deaths_df.total_deaths.plot(kind='pie');

**Covid-19 was spread across all Arab countries, not only in Algeria**

### Q 7): Is Algeria among the fisrt 10 Arab countries in the number of cases?

In [None]:
highest_cases_df = covid_arabic_countries_df.sort_values('total_cases', ascending=False).head(10)
highest_cases_df

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('COVID-19 Arab Countries with the Highest Cases')
sns.barplot(highest_cases_df.location, highest_cases_df.total_cases);

In [None]:
plt.title('COVID-19 Arab Countries with the Highest Cases')
highest_cases_df.total_cases.plot(kind='barh', color='darkviolet');

**Yes; Algeria is among the first 10 Arab countries in the number of cases. It ranks 10th**

### Q 8): Is Algeria among the first 10 Arab countries in the number of deaths?

In [None]:
highest_deaths_df = covid_arabic_countries_df.sort_values('total_deaths', ascending=False).head(10)
highest_deaths_df

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('COVID-19 Arab Countries with the Highest Deaths')
sns.barplot(highest_deaths_df.location, highest_deaths_df.total_deaths);

In [None]:
plt.title('COVID-19 Arab Countries with the Highest Deaths')
highest_deaths_df.total_deaths.plot(kind='barh', color='red');

**Yes; Algeria is among the first 10 Arab countries in the number of deaths. It ranks 4th**

### Q 9): Algeria is one of the first Arab countries on both lists: high 'total_cases & total_deaths'. Do the values have a relationship with the population in Algeria compared to Arab countries?

In [None]:
num_countries = pd.merge(highest_cases_df, highest_deaths_df)
num_countries

In [None]:
high_ar_population = combined_df.sort_values('population', ascending=False).head(10)
high_ar_population

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('Highest Arab Countries in population')
sns.barplot(high_ar_population.location, high_ar_population.population);

In [None]:
plt.title('Highest Arab Countries in population')
high_ar_population.population.plot(kind='barh', color='blue');

**Algeria is among the first 10 Arab countries in population. It ranks 2th. This explains (somewhat) why it's on both lists: high 'total_cases & total_deaths'**

### Q 10): Is Algeria among the first 10 Arab countries in the number of hospital beds per thousand?

In [None]:
hosp_beds_per_thousand = combined_df.sort_values('hospital_beds_per_thousand', ascending=False).head(10)
hosp_beds_per_thousand

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('Highest Arab Countries in the number of hospital beds per thousand')
sns.barplot(hosp_beds_per_thousand.location, hosp_beds_per_thousand.hospital_beds_per_thousand);

In [None]:
plt.title('Highest Arab Countries in the number of hospital beds per thousand')
hosp_beds_per_thousand.hospital_beds_per_thousand.plot(kind='barh', color='gray');

**Yes; Algeria is among the first 10 Arab countries in in the number of hospital beds per thousand. It ranks 8th**

### Q 11): Is Algeria among the first 10 Arab countries in life expectancy?

In [None]:
life_expectancy_df = combined_df.sort_values('life_expectancy', ascending=False).head(10)
life_expectancy_df

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('Highest Arab Countries in life expectancy')
sns.barplot(life_expectancy_df.location, life_expectancy_df.life_expectancy);

In [None]:
plt.title('Highest Arab Countries in life expectancy')
life_expectancy_df.life_expectancy.plot(kind='pie');

**Yes; Algeria is among the first 10 Arab countries in in life expectancy. It ranks 6th**

### Q 12): Is Algeria among the first 10 Arab countries in GDP per capita?

In [None]:
high_ar_gdp_capta = combined_df.sort_values('gdp_per_capita', ascending=False).head(10)
high_ar_gdp_capta

In [None]:
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title('Highest Arab Countries in GDP per capita')
sns.barplot(high_ar_gdp_capta.location, high_ar_gdp_capta.gdp_per_capita);

In [None]:
plt.title('Highest Arab Countries in GDP per capita')
high_ar_gdp_capta.gdp_per_capita.plot(kind='barh',color='green');

**Yes; Algeria is among the first 10 Arab countries in GDP per capita. It ranks 9th**

### Some selected charts

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(20, 10))

axes[0,0].set_title('COVID-19 Total Cases in Algeria Provinces')
sns.barplot(provinces_df.province, provinces_df.total_cases, ax=axes[0,0]);

axes[0,1].set_title('Provinces with the Highest Cases')
sns.barplot(highest_prov_cases.province, highest_prov_cases.total_cases, ax=axes[0,1]);

axes[0,2].set_title('COVID-19 Total Deaths in Algeria Provinces')
sns.barplot(provinces_df.province, provinces_df.total_deaths, ax=axes[0,2]);

axes[0,3].set_title('Provinces with the Highest Deaths')
sns.barplot(highest_prov_deaths.province, highest_prov_deaths.total_deaths, ax=axes[0,3]);

axes[1,0].set_title('Arab countries: Total cases in each continent')
continent_cases_df.total_cases.plot(kind='pie', ax=axes[1,0]);

axes[1,1].set_title('Arab countries: Total deaths in each continent')
continent_deaths_df.total_deaths.plot(kind='pie', ax=axes[1,1]);

axes[1,2].set_title('Arab Countries with the Highest Cases')
sns.barplot(highest_cases_df.location, highest_cases_df.total_cases, ax=axes[1,2]);

axes[1,3].set_title('Arab Countries with the Highest Deaths')
sns.barplot(highest_deaths_df.location, highest_deaths_df.total_deaths, ax=axes[1,3]);

plt.tight_layout(pad=2);

In [None]:
import jovian

In [None]:
jovian.commit()

# Inferences and Conclusion


### Data Preparation and Cleaning

* Data is provided for 275 days: from Dec 31, 2019 to Sep 30, 2020.
* There are 5 missing new cases & 5 missing new deaths in the dataset
* The first case was on Feb 26, 2020. 
* The first death was on March 13, 2020
* All dates with the highest new cases are in July
* Julyl 25, 2020 was the day with the highest new cases at all: 675
* All dates with the highest new deaths are in April
* April 04, 2020 was the day with the highest new deaths at all:42
* Most dates with the lowest new cases & new deaths are in March


### Exploratory Analysis and Visualization

* The number of total cases is 51368 and the number of total deaths is 1726 in Algeria; Until September 30, 2020.
* The average of total cases is 190 and the average of total deaths is 6 in Algeria
* The overall death rate in Algeria is 3.36 %
* The highest new cases are greater than 600
* The highest new deaths are between 20 and 42
* July was the month with the highest number of new cases 12823 and deaths 257
* We notice that the disease has two waves of spread:
  - The First: the speed of spread was medium
  - The second: the speed of spread was very fast
* Most new cases are lying in the range of fewer than 200 cases per day.
* The number of deaths is very small compared to the very large number of cases.
* Most new daths are lying in the range of fewer than 15 deaths per day.
* The number of deaths is very small, over all days & months, which is a good sign.
* Most of those who were infected have recovered.
* The most cases are in months: July & August.
* The most deaths are between April & September.
* The most cases & deaths are in the summer season.
* That is, Covid-19 spreads in the summer when the temperature rates are high, as well.


### Asking and Answering Questions

#### Part (I): Algeria & its Provinces

* **The number of cases per million people in Algeria is low: less than 16** 
* **The number of deaths per million people in Algeria is very low: less than 1**
* **Covid-19 spread throughout all Algeria.**
* **Cases appeared in all provinces.**
* **Blida is the Algerian province where the first confirmed case of COVID-19 infection appeared on March 1, 2020.**
* **Note: On February 26, 2020, The confirmed case was of a foreign worker coming to Algeria from a foreign country.**

#### Part (II): Algeria & Arab countries

* **Covid-19 was spread across all Arab countries, not only in Algeria.**
* **Algeria is among the first 10 Arab countries in the number of cases. It ranks 10th**
* **Algeria is among the first 10 Arab countries in the number of deaths. It ranks 4th**
* **Algeria is among the first 10 Arab countries in population. It ranks 2th. This explains (somewhat) why it's on both lists: high 'total_cases & total_deaths'**
* **Algeria is among the first 10 Arab countries in in the number of hospital beds per thousand. It ranks 8th**
* **Algeria is among the first 10 Arab countries in in life expectancy. It ranks 6th**
* **Algeria is among the first 10 Arab countries in GDP per capita. It ranks 9th**


### Conclusion

The COVID-19 pandemic in Algeria is part of the worldwide pandemic. Blida is the Algerian province where the first confirmed case of COVID-19 infection appeared on March 1, 2020, and the first death was on March 13. The day with the highest new cases at all (675) was on July 25 and the day with the highest new deaths at all (42) was on April 04, 2020.

The number of total cases on Sep 30, 2020, is 51368 and the number of total deaths is 1726, the overall death rate in Algeria is low which is a good sign. We noticed that the disease had two waves of spread and most cases and deaths were in the summer season. 

Covid-19 spread throughout all Algeria, cases appeared in all provinces. The highest number of cases & deaths was in Algiers, Blida, Oran, and Sétif.

Algeria as a part of Arab countries, Covid-19 was spread across all Arab countries also and although Algeria appears on both high total_cases & high total_deaths due to its large population. However, it ranks first in many economic, social, and health indicators compared to the rest of the Arab countries, which explains the very small death rate in it and the very large number of recovered people, and these are good indicators that show that Algeria has managed to control the epidemiological situation and that it is on its way to completely eliminate the spread of Covid-19.

### Some selected charts

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(20, 10))

axes[0,0].set_title('Algeria')
axes[0,0].imshow(img)
axes[0,0].grid(False)
axes[0,0].set_xticks([])
axes[0,0].set_yticks([])

axes[0,1].plot(months, new_cases,'s--b')
axes[0,1].set_xlabel('Months')
axes[0,1].set_ylabel('New Cases')
axes[0,1].set_title("COVID-19 Monthly New Cases in Algeria")
axes[0,1].legend(['new cases']);

axes[0,2].plot(months, new_deaths, 'o-r')
axes[0,2].set_xlabel('Months')
axes[0,2].set_ylabel('New Deaths')
axes[0,2].set_title("COVID-19 Monthly New Deaths in Algeria")
axes[0,2].legend(['new deaths']);

axes[0,3].plot(months, new_cases, 's--b')
axes[0,3].plot(months, new_deaths, 'o-r')
axes[0,3].set_xlabel('Months')
axes[0,3].set_ylabel('Values')
axes[0,3].set_title("Monthly New Cases & New Deaths in Algeria")
axes[0,3].legend(['new cases', 'new deaths']);

axes[1,0].set_title('New Cases Range in Algeria')
axes[1,0].set_xlabel('New Cases')
axes[1,0].set_ylabel('Values')
axes[1,0].hist(covid_dz.new_cases, bins=np.arange(1, 675, 10), color='orchid');

axes[1,1].set_title('New Deaths Range in Algeria')
axes[1,1].set_xlabel('New Deaths')
axes[1,1].set_ylabel('Values')
axes[1,1].hist(covid_dz.new_deaths, bins=np.arange(1, 42, 1), color='hotpink');

axes[1,2].set_title("COVID-19 Monthly New Cases")
axes[1,2].bar(months, new_cases, color='blueviolet')
axes[1,2].legend(['new cases', 'new deaths']);

axes[1,3].set_title("COVID-19 Monthly New Cases & New Deaths in Algeria")
axes[1,3].bar(months, new_cases,color='blueviolet' )
axes[1,3].bar(months, new_deaths, color='orangered')
axes[1,3].legend(['new cases', 'new deaths']);

plt.tight_layout(pad=2);

In [None]:
fig, axes = plt.subplots(2, 4, figsize=(20, 10))

axes[0,0].set_title('COVID-19 Total Cases in Algeria Provinces')
sns.barplot(provinces_df.province, provinces_df.total_cases, ax=axes[0,0]);

axes[0,1].set_title('Provinces with the Highest Cases')
sns.barplot(highest_prov_cases.province, highest_prov_cases.total_cases, ax=axes[0,1]);

axes[0,2].set_title('COVID-19 Total Deaths in Algeria Provinces')
sns.barplot(provinces_df.province, provinces_df.total_deaths, ax=axes[0,2]);

axes[0,3].set_title('Provinces with the Highest Deaths')
sns.barplot(highest_prov_deaths.province, highest_prov_deaths.total_deaths, ax=axes[0,3]);

axes[1,0].set_title('Arab countries: Total cases in each continent')
continent_cases_df.total_cases.plot(kind='pie', ax=axes[1,0]);

axes[1,1].set_title('Arab countries: Total deaths in each continent')
continent_deaths_df.total_deaths.plot(kind='pie', ax=axes[1,1]);

axes[1,2].set_title('Arab Countries with the Highest Cases')
sns.barplot(highest_cases_df.location, highest_cases_df.total_cases, ax=axes[1,2]);

axes[1,3].set_title('Arab Countries with the Highest Deaths')
sns.barplot(highest_deaths_df.location, highest_deaths_df.total_deaths, ax=axes[1,3]);

plt.tight_layout(pad=2);

In [None]:
import jovian

In [None]:
jovian.commit()

# References and Future Work


* Analyzing Tabular Data using Python and Pandas: https://jovian.ml/aakashns/python-pandas-data-analysis
* Our World In Data: https://ourworldindata.org/coronavirus-source-data
* COVID-19 pandemic in Algeria: http://www.wikiwand.com/en/COVID-19_pandemic_in_Algeria

### Future Work

* Analysis of COVID-19 data in Algeria Provinces and Comparison: Daily cases & deaths.

# Call

### " Victims of COVID-19, for the love of humanity and justice, unite: To compel The International Court of Justice to conduct a serious and thorough investigation of possible criminals responsible for the creation and spread of the pandemic around the world. Criminals must pay for their crimes against humanity."

**By Dr. Ameziane Hocine. E.N.S-Constantine; Algeria.**


In [None]:
import jovian

In [None]:
jovian.commit()