# Aula 2
## Desafio 1
___

Dado o dataset de resultados das eleições de 2020, obtenha a lista das zonas eleitorais com maior taxa de votação, ou seja, uma zona eleitoral por município por Estado.

A tabela resultante deve conter:

* Estado
* Município
* Zona eleitoral
* Número de votantes
* Número de abstenções
* Taxa de votação

Utilize somente Spark SQL para construir esta consulta.


In [1]:
!pip install pyspark

Collecting pyspark
  Downloading pyspark-3.4.1.tar.gz (310.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.4.1-py2.py3-none-any.whl size=311285398 sha256=9ec3a5ae49ff001327916ddc1c5c10dd5d8187b5c051b5c2013c8d63cfeb8fac
  Stored in directory: /root/.cache/pip/wheels/0d/77/a3/ff2f74cc9ab41f8f594dabf0579c2a7c6de920d584206e0834
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.4.1


In [2]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
  .master('local[*]') \
  .appName("Analise de dados de eleições") \
  .config('spark.ui.port', '4050') \
  .getOrCreate()

In [3]:
!git clone https://github.com/michelpf/dataset-brazil-elections-2020-mayor-1st-round

Cloning into 'dataset-brazil-elections-2020-mayor-1st-round'...
remote: Enumerating objects: 326, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 326 (delta 0), reused 3 (delta 0), pack-reused 323[K
Receiving objects: 100% (326/326), 127.02 MiB | 20.68 MiB/s, done.
Resolving deltas: 100% (262/262), done.
Updating files: 100% (151/151), done.


In [4]:
dados = spark.read.option("encoding", "iso-8859-1").option("recursiveFileLookup","true").csv('dataset-brazil-elections-2020-mayor-1st-round/dataset', sep=';', header=True, inferSchema=True)

In [5]:
from pyspark.sql import functions as f

In [6]:
dados.createOrReplaceTempView("votacaoView")

In [31]:
query_consolidacao = spark\
    .sql("""
        SELECT SG_UF, NM_MUNICIPIO, NR_ZONA, SUM(QT_APTOS) AS QT_APTOS, SUM(QT_COMPARECIMENTO) AS QT_COMPARECIMENTO
            FROM votacaoView
            WHERE NM_VOTAVEL NOT IN ('NULO','BRANCO')
            GROUP BY SG_UF, NM_MUNICIPIO, NR_ZONA
            ORDER BY SG_UF, NM_MUNICIPIO, NR_ZONA
    """)

query_consolidacao.show(truncate=False)

+-----+--------------------+-------+--------+-----------------+
|SG_UF|NM_MUNICIPIO        |NR_ZONA|QT_APTOS|QT_COMPARECIMENTO|
+-----+--------------------+-------+--------+-----------------+
|AC   |ACRELÂNDIA          |8      |53243   |41175            |
|AC   |ASSIS BRASIL        |6      |38481   |32035            |
|AC   |BRASILÉIA           |6      |81883   |65718            |
|AC   |BUJARI              |9      |67086   |51564            |
|AC   |CAPIXABA            |2      |43040   |34861            |
|AC   |CRUZEIRO DO SUL     |4      |273415  |218363           |
|AC   |EPITACIOLÂNDIA      |6      |70130   |56483            |
|AC   |FEIJÓ               |7      |123470  |92895            |
|AC   |JORDÃO              |5      |33313   |27559            |
|AC   |MANOEL URBANO       |3      |28060   |22236            |
|AC   |MARECHAL THAUMATURGO|4      |45231   |37163            |
|AC   |MÂNCIO LIMA         |4      |62918   |52513            |
|AC   |PLÁCIDO DE CASTRO   |8      |6702

In [32]:
query_consolidacao.createOrReplaceTempView("votacaoViewConsolidado")

In [30]:
query_taxa = spark\
    .sql("""
        SELECT SG_UF, NM_MUNICIPIO, NR_ZONA, MAX(QT_COMPARECIMENTO/QT_APTOS) AS TAXA
            FROM votacaoViewConsolidado
            GROUP BY SG_UF, NM_MUNICIPIO, NR_ZONA
            ORDER BY SG_UF, NM_MUNICIPIO, NR_ZONA
    """)

query_taxa.show(truncate=False)

+-----+--------------------+-------+------------------+
|SG_UF|NM_MUNICIPIO        |NR_ZONA|TAXA              |
+-----+--------------------+-------+------------------+
|AC   |ACRELÂNDIA          |8      |0.7733410964821666|
|AC   |ASSIS BRASIL        |6      |0.8324887606870923|
|AC   |BRASILÉIA           |6      |0.8025841749813759|
|AC   |BUJARI              |9      |0.7686253465700742|
|AC   |CAPIXABA            |2      |0.8099674721189591|
|AC   |CRUZEIRO DO SUL     |4      |0.7986504032331804|
|AC   |EPITACIOLÂNDIA      |6      |0.8054042492513903|
|AC   |FEIJÓ               |7      |0.7523689965173727|
|AC   |JORDÃO              |5      |0.8272746375288926|
|AC   |MANOEL URBANO       |3      |0.7924447612259444|
|AC   |MARECHAL THAUMATURGO|4      |0.821626760407685 |
|AC   |MÂNCIO LIMA         |4      |0.8346260211704123|
|AC   |PLÁCIDO DE CASTRO   |8      |0.8211264453562104|
|AC   |PORTO ACRE          |1      |0.792609865470852 |
|AC   |PORTO WALTER        |4      |0.8609302523

In [34]:
query_taxa.createOrReplaceTempView("votacaoViewConsolidadoTaxa")

In [36]:
query_taxa_votacao = spark\
    .sql("""
        SELECT v.SG_UF, v.NM_MUNICIPIO, v.NR_ZONA, v.TAXA, c.QT_APTOS, c.QT_COMPARECIMENTO
            FROM votacaoViewConsolidadoTaxa v
            INNER JOIN votacaoViewConsolidado c
            ON v.SG_UF = c.SG_UF
            AND v.NM_MUNICIPIO = c.NM_MUNICIPIO
            AND v.NR_ZONA = c.NR_ZONA
            ORDER BY v.SG_UF, v.NM_MUNICIPIO, v.NR_ZONA
    """)

query_taxa_votacao.show(truncate=False)

+-----+--------------------+-------+------------------+--------+-----------------+
|SG_UF|NM_MUNICIPIO        |NR_ZONA|TAXA              |QT_APTOS|QT_COMPARECIMENTO|
+-----+--------------------+-------+------------------+--------+-----------------+
|AC   |ACRELÂNDIA          |8      |0.7733410964821666|53243   |41175            |
|AC   |ASSIS BRASIL        |6      |0.8324887606870923|38481   |32035            |
|AC   |BRASILÉIA           |6      |0.8025841749813759|81883   |65718            |
|AC   |BUJARI              |9      |0.7686253465700742|67086   |51564            |
|AC   |CAPIXABA            |2      |0.8099674721189591|43040   |34861            |
|AC   |CRUZEIRO DO SUL     |4      |0.7986504032331804|273415  |218363           |
|AC   |EPITACIOLÂNDIA      |6      |0.8054042492513903|70130   |56483            |
|AC   |FEIJÓ               |7      |0.7523689965173727|123470  |92895            |
|AC   |JORDÃO              |5      |0.8272746375288926|33313   |27559            |
|AC 

In [22]:
query_taxa_mun = spark\
    .sql("""
        SELECT SG_UF, NM_MUNICIPIO, MAX(QT_COMPARECIMENTO/QT_APTOS) AS TAXA
            FROM votacaoViewConsolidado
            GROUP BY SG_UF, NM_MUNICIPIO
            ORDER BY SG_UF, NM_MUNICIPIO
    """)

query_taxa_mun.show(truncate=False)

+-----+--------------------+------------------+
|SG_UF|NM_MUNICIPIO        |TAXA              |
+-----+--------------------+------------------+
|AC   |ACRELÂNDIA          |0.7733410964821666|
|AC   |ASSIS BRASIL        |0.8324887606870923|
|AC   |BRASILÉIA           |0.8025841749813759|
|AC   |BUJARI              |0.7686253465700742|
|AC   |CAPIXABA            |0.8099674721189591|
|AC   |CRUZEIRO DO SUL     |0.7986504032331804|
|AC   |EPITACIOLÂNDIA      |0.8054042492513903|
|AC   |FEIJÓ               |0.7523689965173727|
|AC   |JORDÃO              |0.8272746375288926|
|AC   |MANOEL URBANO       |0.7924447612259444|
|AC   |MARECHAL THAUMATURGO|0.821626760407685 |
|AC   |MÂNCIO LIMA         |0.8346260211704123|
|AC   |PLÁCIDO DE CASTRO   |0.8211264453562104|
|AC   |PORTO ACRE          |0.792609865470852 |
|AC   |PORTO WALTER        |0.8609302523511638|
|AC   |RIO BRANCO          |0.7327113662585573|
|AC   |RODRIGUES ALVES     |0.8151844895298117|
|AC   |SANTA ROSA DO PURUS |0.8393344598

In [25]:
query_taxa_uf = spark\
    .sql("""
        SELECT SG_UF, MAX(QT_COMPARECIMENTO/QT_APTOS) AS TAXA
            FROM votacaoViewConsolidado
            GROUP BY SG_UF
            ORDER BY TAXA DESC
    """)

query_taxa_uf.show(truncate=False)

+-----+------------------+
|SG_UF|TAXA              |
+-----+------------------+
|SC   |0.9704597996403802|
|PI   |0.966334326039104 |
|RS   |0.9660810399106327|
|RN   |0.9569467945102463|
|MG   |0.9514207149404217|
|GO   |0.9476133492886584|
|PB   |0.9444004226096143|
|PR   |0.9435291308500477|
|TO   |0.9409193270631871|
|SE   |0.9322449605273243|
|MT   |0.9319316688567674|
|PE   |0.9275620120565273|
|CE   |0.9264072971757703|
|BA   |0.9241085515935626|
|SP   |0.9206493359064593|
|MA   |0.9198401351702625|
|MS   |0.9039275976097929|
|PA   |0.9000839630562553|
|AP   |0.8997778037656414|
|ES   |0.8967150112092236|
+-----+------------------+
only showing top 20 rows

