# Funciones de muestreo

In [1]:
import pandas as pd
import numpy as np
import random

In [4]:
econdata = pd.read_csv("../data/econdata.csv")
econdata.head()

Unnamed: 0,id,geo_point_2d,geo_shape,clave_cat,delegacion,perimetro,tipo,nom_id
0,0,"19.424781053,-99.1327537959","{""type"": ""Polygon"", ""coordinates"": [[[-99.1332...",307_130_11,Cuauhtémoc,B,Mercado,Pino Suárez
1,1,"19.4346139576,-99.1413808393","{""type"": ""MultiPoint"", ""coordinates"": [[-99.14...",002_008_01,Cuautémoc,A,Museo,Museo Nacional de Arquitectura Palacio de Bell...
2,2,"19.4340695945,-99.1306348409","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",006_002_12,Cuautémoc,A,Museo,Santa Teresa
3,3,"19.42489472,-99.12073393","{""type"": ""MultiPoint"", ""coordinates"": [[-99.12...",323_102_06,Venustiano Carranza,B,Hotel,Balbuena
4,4,"19.42358238,-99.12451093","{""type"": ""MultiPoint"", ""coordinates"": [[-99.12...",323_115_12,Venustiano Carranza,B,Hotel,real


In [5]:
econdata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 230 entries, 0 to 229
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            230 non-null    int64 
 1   geo_point_2d  229 non-null    object
 2   geo_shape     229 non-null    object
 3   clave_cat     230 non-null    object
 4   delegacion    230 non-null    object
 5   perimetro     230 non-null    object
 6   tipo          230 non-null    object
 7   nom_id        229 non-null    object
dtypes: int64(1), object(7)
memory usage: 14.5+ KB


## Muestreo aleatorio simple

Cualquier elemento de una población tiene la misma probabilidad de ser elegido para una muestra.
<br>

la función `sample(n)` de pandas nos retorna `n` elementos (rows) aleatorios del DataFrame,. 

In [6]:
# Muestra aleatoria de 8 elementos
aleat_8 = econdata.sample(n=8)
aleat_8

Unnamed: 0,id,geo_point_2d,geo_shape,clave_cat,delegacion,perimetro,tipo,nom_id
48,48,"19.4454876095,-99.1454023878","{""type"": ""Polygon"", ""coordinates"": [[[-99.1457...",003_045_01,Cuauhtémoc,B,Mercado,Martínez de la Torre Anexo
108,108,"19.4341817313,-99.1446221837","{""type"": ""MultiPoint"", ""coordinates"": [[-99.14...",002_020_11,Cuautémoc,B,Museo,Memoria y Tolerancia
82,82,"19.4220369889,-99.120775543","{""type"": ""MultiPoint"", ""coordinates"": [[-99.12...",423_013_18,Venustiano Carranza,B,Hotel,Cordoba
64,64,"19.44281242,-99.13974599","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",003_053_01,Cuautémoc,B,Hotel,San Martin
104,104,"19.43397933,-99.13044075","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",006_002_12,Cuautémoc,A,Hotel,Palacio
173,173,"19.4314834886,-99.1259717478","{""type"": ""MultiPoint"", ""coordinates"": [[-99.12...",006_026_38,Cuautémoc,A,Hotel,Soledad
121,121,"19.4303083246,-99.1405735286","{""type"": ""MultiPoint"", ""coordinates"": [[-99.14...",001_043_15,Cuautémoc,A,Hotel,El Salvador
163,163,"19.4265454033,-99.1224859032","{""type"": ""Polygon"", ""coordinates"": [[[-99.1231...",323_063_05,Venustiano Carranza,B,Mercado,


Con la función `sample(frac)` obtenemos una fracción aleatoria del DataFrame.

In [7]:
# Fracción 25% de los datos.
frac_25 = econdata.sample(frac=.25)
frac_25.head()

Unnamed: 0,id,geo_point_2d,geo_shape,clave_cat,delegacion,perimetro,tipo,nom_id
138,138,"19.4330991176,-99.1423784309","{""type"": ""MultiPoint"", ""coordinates"": [[-99.14...",002_024_08,Cuautémoc,B,Hotel,Marlowe
213,213,"19.432385153,-99.1274363518","{""type"": ""MultiPoint"", ""coordinates"": [[-99.12...",006_018_09,Cuautémoc,A,Hotel,Nevada
37,37,"19.4271233834,-99.125111772","{""type"": ""Polygon"", ""coordinates"": [[[-99.1251...",323_065_01,Venustiano Carranza,B,Mercado,Dulceria
142,142,"19.4263681354,-99.1327278126","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",006_127_14,Cuautémoc,A,Hotel,Ambar
47,47,"19.439101835,-99.13175662","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",004_081_31,Cuautémoc,A,Hotel,Bolivia


In [10]:
frac_25.shape

(58, 8)

## Muestreo sistemático

Técnica de muestreo al que se le indica una regla/norma a seguir para la selección de los elementos de una población.
<br>

Para este ejemplo construimos una función para obtener un muestreo sistematico.

In [15]:
def systematic_sampling(data, step):
  indexes = np.arange(0, len(data), step=step)
  sample = data.iloc[indexes]
  return sample

Llamamos a la función para crear la muestra.
Obtendremos cada 5 elementos del DataFrame.

In [16]:
sample = systematic_sampling(econdata, 5)
sample.head()

Unnamed: 0,id,geo_point_2d,geo_shape,clave_cat,delegacion,perimetro,tipo,nom_id
0,0,"19.424781053,-99.1327537959","{""type"": ""Polygon"", ""coordinates"": [[[-99.1332...",307_130_11,Cuauhtémoc,B,Mercado,Pino Suárez
5,5,"19.4263287068,-99.1207277209","{""type"": ""MultiPoint"", ""coordinates"": [[-99.12...",323_161_11,Venustiano Carranza,B,Hotel,Baño San Tiago
10,10,"19.4441424478,-99.14600807","{""type"": ""MultiPoint"", ""coordinates"": [[-99.14...",003_048_10,Cuautémoc,B,Hotel,Moctezuma
15,15,"19.42413788,-99.1324515","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",307_153_11,Cuautémoc,B,Hotel,San Lucas
20,20,"19.4357307042,-99.1326583218","{""type"": ""MultiPoint"", ""coordinates"": [[-99.13...",004_098_26,Cuautémoc,A,Museo,La Caricatura


## Muestreo estratificado