### Conectando o Google Drive ao Colab

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Lendo Dataset em pasta do Google Drive

In [4]:
!unzip "/content/drive/MyDrive/Datasets/Monkeypox/latest.csv.zip"

Archive:  /content/drive/MyDrive/Datasets/Monkeypox/latest.csv.zip
  inflating: latest.csv              
  inflating: __MACOSX/._latest.csv   


### Configurando PySpark no Google Colab

##### Instalação das dependências

In [14]:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
!tar xf spark-2.4.4-bin-hadoop2.7.tgz
!pip install -q findspark

##### Configurando as variáveis do ambiente

In [15]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.4-bin-hadoop2.7"

# tornar o pyspark "importável"
import findspark
findspark.init('spark-2.4.4-bin-hadoop2.7')

##### Importando e Criando SparkSession

In [18]:
from pyspark.sql import SparkSession

spark = SparkSession.builder\
        .master("local")\
        .appName("monkeypoxcolab")\
        .config('spark.ui.port', '4050')\
        .getOrCreate()

In [19]:
spark

### Subindo os dados no PySpark

In [28]:
monkeypoxdf = spark.read.csv("/content/drive/MyDrive/Datasets/latest.csv", sep=",", header=True, inferSchema=True)

##### Mostrando os detalhes das colunas do dataframe

In [32]:
monkeypoxdf.printSchema()

root
 |-- ID: string (nullable = true)
 |-- Status: string (nullable = true)
 |-- Location: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Country: string (nullable = true)
 |-- Country_ISO3: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)
 |-- Date_onset: timestamp (nullable = true)
 |-- Date_confirmation: timestamp (nullable = true)
 |-- Symptoms: string (nullable = true)
 |-- Hospitalised (Y/N/NA): string (nullable = true)
 |-- Date_hospitalisation: timestamp (nullable = true)
 |-- Isolated (Y/N/NA): string (nullable = true)
 |-- Date_isolation: timestamp (nullable = true)
 |-- Outcome: string (nullable = true)
 |-- Contact_comment: string (nullable = true)
 |-- Contact_ID: integer (nullable = true)
 |-- Contact_location: string (nullable = true)
 |-- Travel_history (Y/N/NA): string (nullable = true)
 |-- Travel_history_entry: string (nullable = true)
 |-- Travel_history_start: string (nullable = true)
 |-- Travel_

#### Número de linhas no Dataframe 
##### *número total de casos rastreados no mundo incluindo a universalidade de status (confirmado, descartado, investigando e omit_error)*




In [33]:
monkeypoxdf.count()

49289

In [37]:
monkeypoxdf.select("Status").distinct().show()

+----------+
|    Status|
+----------+
|omit_error|
|      null|
| suspected|
| confirmed|
| discarded|
+----------+



#### Descrevendo aspectos das colunas

In [38]:
monkeypoxdf.describe().show()

+-------+--------------------+---------+-------------+-------------+-------+------------+-----+------+--------------------+---------------------+-----------------+---------+--------------------+-----------------+--------------------+-----------------------+--------------------+--------------------+-----------------------+----------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------+---------+----------+
|summary|                  ID|   Status|     Location|         City|Country|Country_ISO3|  Age|Gender|            Symptoms|Hospitalised (Y/N/NA)|Isolated (Y/N/NA)|  Outcome|     Contact_comment|       Contact_ID|    Contact_location|Travel_history (Y/N/NA)|Travel_history_entry|Travel_history_start|Travel_history_location|Travel_history_country|   Genomics_Metadata| Confirmation_method|              Source|           Source_II|          Source_III|           Source_IV|Source_V|Source_VI|Source

In [40]:
monkeypoxdf.select("Symptoms").distinct().toPandas()

Unnamed: 0,Symptoms
0,"headache, skin lesions"
1,"Perianal rash, fever"
2,"Fever, chills, fatigue, headache, skin lesions"
3,"headache, muscle pain, back pain, vasicular ra..."
4,"fever, outbreak on the skin, hands, and chest"
...,...
94,"general weakness, fever, skin rashes"
95,genital ulcers
96,"Fever, skin rashes"
97,"Rash, muscle ache, fatigue"
