# Extract Datasets

In this notebook we load the dataset from ENOE. ENOE records the employment numbers and some important facts about it.

We use `urllib.request` to use a request method to extract the dataset directly from INEGI and to load it directly in our disk. If the urls have not changed you should be able to extract the csv directly from the source and work it the same.

In [2]:
# Import the necessary modules
import pandas as pd
from samplics.estimation import TaylorEstimator
from urllib.request import urlopen
from zipfile import ZipFile
from io import BytesIO


In [3]:
# Extract and download from ENOE_N (post-covid)
# Extraer los datos de la ENOE N

for year in range(2020,2023):
    for trimestre in range(1,5):
        url = "https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/enoe_n_"+str(year)+"_trim"+str(trimestre)+"_csv.zip"
        
        try:
            resp = urlopen(url)
            zipfile = ZipFile(BytesIO(resp.read()))
            zipfile.extractall("../data/ENOE_N")
        except:
            print("Oops, something went wrong while loading dataset of quarter " + str(trimestre)+ " in year " + str(year))


Oops, something went wrong while loading dataset of quarter 1 in year 2020
Oops, something went wrong while loading dataset of quarter 2 in year 2020
Oops, something went wrong while loading dataset of quarter 2 in year 2022
Oops, something went wrong while loading dataset of quarter 3 in year 2022
Oops, something went wrong while loading dataset of quarter 4 in year 2022


In [4]:
# Extract and download from ENOE (before-covid)
# Extraer los datos de la ENOE (antes de COVID)


"https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/2020trim1_csv.zip"


for year in range(2010,2021):
    for trimestre in range(1,5):
        url = "https://www.inegi.org.mx/contenidos/programas/enoe/15ymas/microdatos/"+str(year)+"trim"+str(trimestre)+"_csv.zip"
        
        try:
            resp = urlopen(url)
            zipfile = ZipFile(BytesIO(resp.read()))
            zipfile.extractall("../data/ENOE")
        except:
            print("Oops, something went wrong while loading dataset of quarter " + str(trimestre)+ " in year " + str(year))


Oops, something went wrong while loading dataset of quarter 2 in year 2020
Oops, something went wrong while loading dataset of quarter 3 in year 2020
Oops, something went wrong while loading dataset of quarter 4 in year 2020


In [5]:
# Extract ENDISEG and ENDISEG web

# Function to extract data
def extract_dataset(url, folder_name):
    try:
        resp = urlopen(url)
        zipfile = ZipFile(BytesIO(resp.read()))
        zipfile.extractall("../data/"+folder_name)
    except:
        print("Hubo algún error al extraer la base de datos")

# ENDISEG WEB
url = "https://www.inegi.org.mx/contenidos/investigacion/endiseg/2022/datosabiertos/conjunto_de_datos_endiseg_web_2022_csv.zip"
extract_dataset(url, "ENDISEG_WEB")

# ENDISEG
url = "https://www.inegi.org.mx/contenidos/programas/endiseg/2021/datosabiertos/conjunto_de_datos_endiseg_2021_csv.zip"
extract_dataset(url, "ENDISEG")

## Notes
So we load our datasets. Quarter 2 of 2020 is available through ETOE (the telephone version of enoe). Some of the errors from the last load also come from quarter 2 and 3, which are on ENOE_N