# PySpark - Read Json PokeAPI (Custom Schema) 

Este projeto tem o objetivo de realizar a conversão de uma estrutura (json/dict) para um schema(StructType) que será utilizado na criação de um dataframe.<br>

A estrutura definida para a conversão é similar a estrura do json original, porém, ao invés de conter o valor do atributo ele é substituido pela tipagem desejada.<br>

| Spark Type | Json Type | Spark Type | Json Type | Spark Type | Json Type |
| --- | --- | --- | --- | --- | --- |
| DataType | data | NullType | void | StructType | struct |
| StringType | string | BinaryType | binary | MapType | map |
| BooleanType | boolean | DateType | date | ArrayType | array |
| TimestampType | timestamp | DecimalType | decimal | ShortType | short |
| DoubleType | double | FloatType | float | LongType | long |
| ByteType | byte | IntegerType | integer |  |  |

## Referências

| Descrição | URL |
| --- | --- |
| PySpark Read JSON file into DataFrame | https://sparkbyexamples.com/pyspark/pyspark-read-json-file-into-dataframe/#custom-schema |
| PySpark StructType & StructField Explained with Examples | https://sparkbyexamples.com/pyspark/pyspark-structtype-and-structfield/ |
| PySpark Documentation - StructType | https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.types.StructType.html |



In [1]:
from IPython import display as ipy_display
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
from pyspark.sql import types as T
import requests
import json

In [2]:
spark = SparkSession \
    .builder \
    .config("spark.sql.sources.partitionOverwriteMode", "dynamic") \
    .config("spark.sql.caseSensitive", True) \
    .config("spark.sql.repl.eagerEval.enabled",True) \
    .appName("catch_pokemons") \
    .getOrCreate()

In [3]:
def display(df, n=5):
    return ipy_display.HTML(df.limit(n).toPandas().to_html(index=False))

In [4]:
SPARK_TYPES = {}
for _ in T.__all__:
    try:
        SPARK_TYPES[eval(f"T.{_}.typeName()")] = _
    except:
        pass

In [5]:
def dict_to_spark_schema(input_data, parent_key=None, output_schema=None):
    if not bool(output_schema):
        output_schema = {
            "fields": [],
            "type": "struct"
        }
        
    if isinstance(input_data, dict):
        for key, value in input_data.items():
            base = {
                "metadata": {},
                "name": key,
                "nullable": True,
                "type": {}
            }
            base["type"] = dict_to_spark_schema(value, key, base["type"])
            output_schema["fields"].append(base)
    elif isinstance(input_data, list):
        base = {
            "containsNull": True,
            "elementType": {},
            "type": "array"
        }
        for el in input_data:
            base['elementType'] = dict_to_spark_schema(el, None, base['elementType'])
            output_schema.pop("fields")
            output_schema.update(base) 
    else:        
        output_schema = input_data
    return output_schema

def get_spark_struct_object(input_data):
    struct_object = dict_to_spark_schema(input_data)
    return T.StructType.fromJson(struct_object)

In [6]:
def get_schemas(name):
    map_schemas = {
        "pokemon": {
            "abilities": [{
              "ability": {"name": "string", "url": "string"},
              "is_hidden": "boolean",
              "slot": "integer"
            }],
            "base_experience": "integer",
            "forms": [{"name": "string", "url": "string"}],
            "height": "integer",
            "id": "integer",
            "is_default": "boolean",
            "location_area_encounters": "string",
            "name": "string",
            "order": "integer",
            "species": {"name": "string", "url": "string"},
            "types": [{
              "slot": "integer",
              "type": {"name": "string", "url": "string"}
            }],
            "weight": "integer"
        },
        "type": {
            "damage_relations": {
                "double_damage_from": [{"name": "string", "url": "string"}],
                "double_damage_to": [{"name": "string", "url": "string"}],
                "half_damage_from": [{"name": "string", "url": "string"}],
                "half_damage_to": [{"name": "string", "url": "string"}],
                "no_damage_from": [{"name": "string", "url": "string"}],
                "no_damage_to": [{"name": "string", "url": "string"}]
            },
            "id": "integer",
            "name": "string"
        }
    }
    
    return map_schemas[name]

def catch_pokemons(total, limit=10):
    pokemons = []
    for offset in range(0, total, limit):
        if (total - offset) < limit:
            limit = (total - offset)
        response = requests.get(f'https://pokeapi.co/api/v2/pokemon/?limit={limit}&offset={offset}')
        for item in response.json().get('results', []):
            print(item['url'])
            pokemons.append(requests.get(item['url']).json())
    return pokemons

def get_types(total, limit=10):
    types = []
    for offset in range(0, total, limit):
        if (total - offset) < limit:
            limit = (total - offset)
        response = requests.get(f'https://pokeapi.co/api/v2/type/?limit={limit}&offset={offset}')
        for item in response.json().get('results', []):
            print(item['url'])
            types.append(requests.get(item['url']).json())
    return types

def pokeapi_spark(total_items, schema_name):
    struct_object= get_spark_struct_object(get_schemas(schema_name))
    pokeapi_dataset = None
    if schema_name == "pokemon":
        pokeapi_dataset = catch_pokemons(total_items)
    elif schema_name == "type":
        pokeapi_dataset = get_types(total_items)
    df = spark.createDataFrame(pokeapi_dataset, struct_object)
    return df

In [7]:
df_pokemons = pokeapi_spark(10, "pokemon")

https://pokeapi.co/api/v2/pokemon/1/
https://pokeapi.co/api/v2/pokemon/2/
https://pokeapi.co/api/v2/pokemon/3/
https://pokeapi.co/api/v2/pokemon/4/
https://pokeapi.co/api/v2/pokemon/5/
https://pokeapi.co/api/v2/pokemon/6/
https://pokeapi.co/api/v2/pokemon/7/
https://pokeapi.co/api/v2/pokemon/8/
https://pokeapi.co/api/v2/pokemon/9/
https://pokeapi.co/api/v2/pokemon/10/


In [8]:
display(df_pokemons, n=5)

abilities,base_experience,forms,height,id,is_default,location_area_encounters,name,order,species,types,weight
"[((overgrow, https://pokeapi.co/api/v2/ability/65/), False, 1), ((chlorophyll, https://pokeapi.co/api/v2/ability/34/), True, 3)]",64,"[(bulbasaur, https://pokeapi.co/api/v2/pokemon-form/1/)]",7,1,True,https://pokeapi.co/api/v2/pokemon/1/encounters,bulbasaur,1,"(bulbasaur, https://pokeapi.co/api/v2/pokemon-species/1/)","[(1, (grass, https://pokeapi.co/api/v2/type/12/)), (2, (poison, https://pokeapi.co/api/v2/type/4/))]",69
"[((overgrow, https://pokeapi.co/api/v2/ability/65/), False, 1), ((chlorophyll, https://pokeapi.co/api/v2/ability/34/), True, 3)]",142,"[(ivysaur, https://pokeapi.co/api/v2/pokemon-form/2/)]",10,2,True,https://pokeapi.co/api/v2/pokemon/2/encounters,ivysaur,2,"(ivysaur, https://pokeapi.co/api/v2/pokemon-species/2/)","[(1, (grass, https://pokeapi.co/api/v2/type/12/)), (2, (poison, https://pokeapi.co/api/v2/type/4/))]",130
"[((overgrow, https://pokeapi.co/api/v2/ability/65/), False, 1), ((chlorophyll, https://pokeapi.co/api/v2/ability/34/), True, 3)]",236,"[(venusaur, https://pokeapi.co/api/v2/pokemon-form/3/)]",20,3,True,https://pokeapi.co/api/v2/pokemon/3/encounters,venusaur,3,"(venusaur, https://pokeapi.co/api/v2/pokemon-species/3/)","[(1, (grass, https://pokeapi.co/api/v2/type/12/)), (2, (poison, https://pokeapi.co/api/v2/type/4/))]",1000
"[((blaze, https://pokeapi.co/api/v2/ability/66/), False, 1), ((solar-power, https://pokeapi.co/api/v2/ability/94/), True, 3)]",62,"[(charmander, https://pokeapi.co/api/v2/pokemon-form/4/)]",6,4,True,https://pokeapi.co/api/v2/pokemon/4/encounters,charmander,5,"(charmander, https://pokeapi.co/api/v2/pokemon-species/4/)","[(1, (fire, https://pokeapi.co/api/v2/type/10/))]",85
"[((blaze, https://pokeapi.co/api/v2/ability/66/), False, 1), ((solar-power, https://pokeapi.co/api/v2/ability/94/), True, 3)]",142,"[(charmeleon, https://pokeapi.co/api/v2/pokemon-form/5/)]",11,5,True,https://pokeapi.co/api/v2/pokemon/5/encounters,charmeleon,6,"(charmeleon, https://pokeapi.co/api/v2/pokemon-species/5/)","[(1, (fire, https://pokeapi.co/api/v2/type/10/))]",190


In [9]:
df_pokemons.select(
    "id",
    "name",
    df_pokemons.types[0].type['name'].alias('type_1'),
    df_pokemons.types[1].type['name'].alias('type_2'),
    "base_experience",
    "weight",
    "height"
)

id,name,type_1,type_2,base_experience,weight,height
1,bulbasaur,grass,poison,64,69,7
2,ivysaur,grass,poison,142,130,10
3,venusaur,grass,poison,236,1000,20
4,charmander,fire,,62,85,6
5,charmeleon,fire,,142,190,11
6,charizard,fire,flying,240,905,17
7,squirtle,water,,63,90,5
8,wartortle,water,,142,225,10
9,blastoise,water,,239,855,16
10,caterpie,bug,,39,29,3


In [10]:
df_types = pokeapi_spark(10, "type")

https://pokeapi.co/api/v2/type/1/
https://pokeapi.co/api/v2/type/2/
https://pokeapi.co/api/v2/type/3/
https://pokeapi.co/api/v2/type/4/
https://pokeapi.co/api/v2/type/5/
https://pokeapi.co/api/v2/type/6/
https://pokeapi.co/api/v2/type/7/
https://pokeapi.co/api/v2/type/8/
https://pokeapi.co/api/v2/type/9/
https://pokeapi.co/api/v2/type/10/


In [11]:
display(df_types, n=5)

damage_relations,id,name
"([(fighting, https://pokeapi.co/api/v2/type/2/)], [], [], [(rock, https://pokeapi.co/api/v2/type/6/), (steel, https://pokeapi.co/api/v2/type/9/)], [(ghost, https://pokeapi.co/api/v2/type/8/)], [(ghost, https://pokeapi.co/api/v2/type/8/)])",1,normal
"([(flying, https://pokeapi.co/api/v2/type/3/), (psychic, https://pokeapi.co/api/v2/type/14/), (fairy, https://pokeapi.co/api/v2/type/18/)], [(normal, https://pokeapi.co/api/v2/type/1/), (rock, https://pokeapi.co/api/v2/type/6/), (steel, https://pokeapi.co/api/v2/type/9/), (ice, https://pokeapi.co/api/v2/type/15/), (dark, https://pokeapi.co/api/v2/type/17/)], [(rock, https://pokeapi.co/api/v2/type/6/), (bug, https://pokeapi.co/api/v2/type/7/), (dark, https://pokeapi.co/api/v2/type/17/)], [(flying, https://pokeapi.co/api/v2/type/3/), (poison, https://pokeapi.co/api/v2/type/4/), (bug, https://pokeapi.co/api/v2/type/7/), (psychic, https://pokeapi.co/api/v2/type/14/), (fairy, https://pokeapi.co/api/v2/type/18/)], [], [(ghost, https://pokeapi.co/api/v2/type/8/)])",2,fighting
"([(rock, https://pokeapi.co/api/v2/type/6/), (electric, https://pokeapi.co/api/v2/type/13/), (ice, https://pokeapi.co/api/v2/type/15/)], [(fighting, https://pokeapi.co/api/v2/type/2/), (bug, https://pokeapi.co/api/v2/type/7/), (grass, https://pokeapi.co/api/v2/type/12/)], [(fighting, https://pokeapi.co/api/v2/type/2/), (bug, https://pokeapi.co/api/v2/type/7/), (grass, https://pokeapi.co/api/v2/type/12/)], [(rock, https://pokeapi.co/api/v2/type/6/), (steel, https://pokeapi.co/api/v2/type/9/), (electric, https://pokeapi.co/api/v2/type/13/)], [(ground, https://pokeapi.co/api/v2/type/5/)], [])",3,flying
"([(ground, https://pokeapi.co/api/v2/type/5/), (psychic, https://pokeapi.co/api/v2/type/14/)], [(grass, https://pokeapi.co/api/v2/type/12/), (fairy, https://pokeapi.co/api/v2/type/18/)], [(fighting, https://pokeapi.co/api/v2/type/2/), (poison, https://pokeapi.co/api/v2/type/4/), (bug, https://pokeapi.co/api/v2/type/7/), (grass, https://pokeapi.co/api/v2/type/12/), (fairy, https://pokeapi.co/api/v2/type/18/)], [(poison, https://pokeapi.co/api/v2/type/4/), (ground, https://pokeapi.co/api/v2/type/5/), (rock, https://pokeapi.co/api/v2/type/6/), (ghost, https://pokeapi.co/api/v2/type/8/)], [], [(steel, https://pokeapi.co/api/v2/type/9/)])",4,poison
"([(water, https://pokeapi.co/api/v2/type/11/), (grass, https://pokeapi.co/api/v2/type/12/), (ice, https://pokeapi.co/api/v2/type/15/)], [(poison, https://pokeapi.co/api/v2/type/4/), (rock, https://pokeapi.co/api/v2/type/6/), (steel, https://pokeapi.co/api/v2/type/9/), (fire, https://pokeapi.co/api/v2/type/10/), (electric, https://pokeapi.co/api/v2/type/13/)], [(poison, https://pokeapi.co/api/v2/type/4/), (rock, https://pokeapi.co/api/v2/type/6/)], [(bug, https://pokeapi.co/api/v2/type/7/), (grass, https://pokeapi.co/api/v2/type/12/)], [(electric, https://pokeapi.co/api/v2/type/13/)], [(flying, https://pokeapi.co/api/v2/type/3/)])",5,ground


In [12]:
df_types.select(
    "id",
    "name",
    df_types.damage_relations['double_damage_from'][0]["url"].alias('double_damage_from_1'),
    df_types.damage_relations['double_damage_to'][0]["url"].alias('double_damage_to_1'),
    df_types.damage_relations['half_damage_from'][0]["url"].alias('half_damage_from_1'),
    df_types.damage_relations['half_damage_to'][0]["url"].alias('half_damage_to_1'),
    df_types.damage_relations['no_damage_from'][0]["url"].alias('no_damage_from_1'),
    df_types.damage_relations['no_damage_to'][0]["url"].alias('no_damage_to_1')
)

id,name,double_damage_from_1,double_damage_to_1,half_damage_from_1,half_damage_to_1,no_damage_from_1,no_damage_to_1
1,normal,https://pokeapi.c...,,,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...
2,fighting,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,,https://pokeapi.c...
3,flying,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,
4,poison,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,,https://pokeapi.c...
5,ground,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...
6,rock,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,,
7,bug,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,,
8,ghost,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...
9,steel,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,
10,fire,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,https://pokeapi.c...,,
