# IBM Capstone Project

### Background and Description of Issue


Coronaviruses are important human and animal pathogens. At the end of 2019, a novel coronavirus was identified as the cause of a cluster of pneumonia cases in Wuhan, a city in the Hubei Province of China. It rapidly spread, resulting in an epidemic throughout China, followed by an increasing number of cases in other countries throughout the world. In February 2020, the World Health Organization designated the disease COVID-19, which stands for coronavirus disease 2019. The virus that causes COVID-19 is designated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); previously, it was referred to as 2019-nCoV.


This project will be on the current coronavirus pandemic, and will utilize geolocations to look at the current statistics as of March 27, 2020. The data on the coronavirus will be vizualized and analyzed in this report. We will exclusively look at COVID-19 cases in the United States. First, we will look at the current coronavirus statistics in each state. Then, we will take information regarding hospital beds to see which states are most in need of help whether that be with extra hospital beds, medical supplies or healthcare workers. 

This data could be valuable to solve problems such as where to designate certain resources that can assist with each state's needs regarding the pandemic. 


### Current Active Cases in the United States

The goal of this first part is to create a bubble map of the number of active coronavirus cases in each state by utilizing web scraping methods. 

The different libraries needed will be installed, and data extraction and manipulation will then be done to get the data ready for folium, where the bubble map will be created. 

In [10]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

#import json # library to handle JSON files

import requests # library to handle requests

#from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors # Matplotlib and associated plotting modules

#from sklearn.cluster import KMeans # import k-means from clustering stage

!conda install -c conda-forge beautifulsoup4 --yes
from bs4 import BeautifulSoup # website scraping libraries and packages in Python from BeautifulSoup 

#!conda install -c conda-forge geopy --yes
#from geopy.geocoders import Nominatim  # convert an address into latitude and longitude values

print("Libraries imported.")

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [11]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

from pandas.io.json import json_normalize # tranform JSON file into a

Solving environment: done

# All requested packages already installed.



In [139]:
data = requests.get('https://www.latlong.net/category/states-236-14.html').text
soup = BeautifulSoup(data, 'html.parser')

stateList = []
latList = []
longList = [] 

# find all the rows of the table
soup.find('table').find_all('tr')

# for each row of the table, find all the table data

for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        stateList.append(cells[0].text)
        latList.append(cells[1].text)
        longList.append(cells[2].text.rstrip('\n'))

for i in range(len(stateList)):
    stateList[i] = stateList[i].split(",")[0]
    latList[i] = int(float(latList[i]))
    longList[i] = int(float(longList[i]))
    
    
df1 = pd.DataFrame({"State": stateList,
                           "Latitude": latList,
                           "Longitude": longList})

df1 = df1.sort_values('State')
#pd.set_option('display.max_colwidth', -1)
df1 = df1.reset_index(drop=True)
df1.at[46, 'State'] = 'Washington'
df1.at[24, 'State'] = 'Missouri'
df1.head()

Unnamed: 0,State,Latitude,Longitude
0,Alabama,32,-86
1,Alaska,66,-153
2,Arizona,34,-111
3,Arkansas,34,-92
4,California,36,-119


In [140]:
cases = requests.get('https://www.worldometers.info/coronavirus/country/us/').text
soup = BeautifulSoup(cases, 'html.parser')

stateList1 = []
totalCase = []
newCase   = [] 
totalDeath= []
newDeath  = []
activeCase= [] 

# find all the rows of the table
soup.find('table').find_all('tr')

# for each row of the table, find all the table data

for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        stateList1.append(cells[0].text[1:])
        totalCase.append(cells[1].text)
        newCase.append(cells[2].text[1:])
        totalDeath.append(cells[3].text[1:])
        newDeath.append(cells[4].text)
        activeCase.append(cells[5].text[1:])
    
for i in range(len(stateList1)):
    stateList1[i] = str(stateList1[i]).rstrip()
    activeCase[i] = int(activeCase[i].replace(',',''))

In [141]:
df2 = pd.DataFrame({"State": stateList1,
                         "Total Cases":totalCase,
                         "New Cases":newCase,
                         "Total Deaths":totalDeath,
                         "New Deaths":newDeath,
                         "Active Cases":activeCase})
df2 = df2[:-7]

df2 = df2.drop(df2.index[35])
df2 = df2.sort_values('State')
df2 = df2.reset_index(drop=True)
df2.head()
                    

Unnamed: 0,State,Total Cases,New Cases,Total Deaths,New Deaths,Active Cases
0,Alabama,935,108,11,2.0,924
1,Alaska,114,12,2,,112
2,Arizona,1157,238,20,3.0,1134
3,Arkansas,508,82,7,1.0,469
4,California,7248,1044,145,14.0,7043


In [143]:
#fulldata = pd.merge(df1, df2, left_index=True, right_index=True)
fulldata = pd.merge(df1, df2, on = "State")
fulldata.head()

Unnamed: 0,State,Latitude,Longitude,Total Cases,New Cases,Total Deaths,New Deaths,Active Cases
0,Alabama,32,-86,935,108,11,2.0,924
1,Alaska,66,-153,114,12,2,,112
2,Arizona,34,-111,1157,238,20,3.0,1134
3,Arkansas,34,-92,508,82,7,1.0,469
4,California,36,-119,7248,1044,145,14.0,7043


# **Bubble map of Covid19 cases in the United States**

In [148]:
mymap = folium.Map(location=[15, -88], zoom_start = 4) 
for i in range(len(fulldata)):
    folium.Circle(
        location = [fulldata.iloc[i]['Latitude'] , fulldata.iloc[i]['Longitude']],
        radius = int(fulldata.iloc[i]['Active Cases'])*10,
        popup = fulldata.iloc[i]['State'],
        color = 'crimson',
        fill = True, 
        fill_color = 'crimson').add_to(mymap) 
mymap

### Analysis on Bubble Map

Based on the bubble map above, it can be seen that the majority of the cases right now lies in the East Coast. In order to get a better view of the statistics in terms of active cases, we can see the top 5 highest amount of cases by state here: 

In [164]:
mostcases = fulldata[['State', 'Active Cases']]

mostcases.sort_values('Active Cases', ascending = False).head(5)

Unnamed: 0,State,Active Cases
30,New York,61674
28,New Jersey,16438
4,California,7043
20,Michigan,6309
19,Massachusetts,5686


## Data on Hospital Beds

In [169]:
data3 = requests.get('https://www.ahd.com/state_statistics.html').text
soup = BeautifulSoup(data3, 'html.parser')

states = []
hospitals = []
beds = []

# find all the rows of the table
soup.find('table').find_all('tr')

# for each row of the table, find all the table data
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        states.append(cells[0].text)
        hospitals.append(cells[1].text)
      #  beds.append(cells[2].text.rstrip('\n'))

#for i in range(len(stateList)):
 #   stateList[i] = stateList[i].split(",")[0]
 #   latList[i] = int(float(latList[i]))
  #  longList[i] = int(float(longList[i]))
    
    
df3 = pd.DataFrame({"State": states,
                           "Number of Hospitals": hospitals,
                           "Number of Beds": beds})

df3 = df3.sort_values('State')
#pd.set_option('display.max_colwidth', -1)
#df1 = df1.reset_index(drop=True)
#df1.at[46, 'State'] = 'Washington'
#df1.at[24, 'State'] = 'Missouri'
#df1.head()
df3

IndexError: list index out of range