![image](../imgs/databites_header.png)

# 10 Python One-Liners for Sampling and Resampling

Sampling and resampling techniques are a core stage in both statistics and data science.
Whether you're working with a massive dataset or preparing for an experiment, being able to draw subsets of data efficiently is critical.


# Preparing the Environment and Loading the Data
- Make sure you have Python 3.13+ and basic SQL/Python skills.
- Create a project directory, set up a virtual environment, and install both numpy and pandas. 


## Installing and importing the required libraries

In [17]:
!pip install pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Getting the dummy data
XXX

In [18]:
import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv'
df = pd.read_csv(url)

## 1. Simple Random Sample (Without Replacement)

In [19]:
df.sample(n=100, random_state=1)

Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Color,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters
201,202,Arabica,tembo coffee company ltd,"Tanzania, United Republic Of",jacksom mwasenga,,tembo coffee company ltd,C46,tembo coffee company ltd,1620m,...,Green,7,"December 12th, 2015",Africa Fine Coffee Association,073285c0d45e2f5539012d969937e529564fa6fe,c4ab13415cdd69376a93780c0166e7b1a10481ea,m,1620.0000,1620.0000,1620.0000
115,116,Arabica,"lin, che-hao krude 林哲豪",Taiwan,shi fang yuan 十方源,2016 Tainan Coffee Cupping Event Micro Lot 臺南市...,shi fang yuan 十方源,Taiwan,taiwan coffee laboratory,350,...,Green,0,"May 18th, 2017",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,350.0000,350.0000,350.0000
255,256,Arabica,taylor winch (coffee) ltd.,Kenya,-,,,37-0569-2347,taylor winch (coffee) ltd,1650,...,Bluish-Green,0,"May 31st, 2013",Kenya Coffee Traders Association,ccba45b89d859740b749878be8c6d16fbdb96c2e,d752c909a015f3c76224b3c5cc520f8a67afda74,m,1650.0000,1650.0000,1650.0000
1040,1041,Arabica,"comercial internacional exportadora, s.a.",Nicaragua,cafetales santa matilde,,beneficio san carlos,017/001/1066,"comercial internacional exportadora, s.a.",1100.00 mosl,...,Green,5,"May 27th, 2016",Asociación de Cafés Especiales de Nicaragua,fc561dd3c2eee024b032933e0a97b4aede0dc206,f79a8d4dee92a80ff14025f03ea34fa316b2132f,m,110000.0000,110000.0000,110000.0000
195,196,Arabica,juan luis alvarado romero,Guatemala,finca san vicente,11/52/709,beneficio exportacafe agua santa,11/52/709,exportcafe,3607,...,Green,1,"April 6th, 2017",Asociacion Nacional Del Café,b1f20fe3a819fd6b2ee0eb8fdc3da256604f1e53,724f04ad10ed31dbb9d260f0dfd221ba48be8a95,ft,1099.4136,1099.4136,1099.4136
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
919,920,Arabica,"lin, che-hao krude 林哲豪",Taiwan,good mood coffee 馨晴咖啡,,good mood coffee 馨晴咖啡,Taiwan,"red on tree co., ltd.",900 m,...,Blue-Green,0,"August 18th, 2015",Specialty Coffee Association,d3ed2a8c1db69c87daef88f425dd0e8ef3216a39,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,900.0000,900.0000,900.0000
428,429,Arabica,exportadora de cafe condor s.a,Colombia,,,trilladora boananza,3-68-0215,exportadora de cafe condor s.a,1750 msnm,...,Green,1,"August 2nd, 2014",Almacafé,e493c36c2d076bf273064f7ac23ad562af257a25,70d3c0c26f89e00fdae6fb39ff54f0d2eb1c38ab,m,1750.0000,1750.0000,1750.0000
585,586,Arabica,"lin, che-hao krude 林哲豪",Taiwan,大鋤花間 (hoe vs. flower coffee farm),2017台南市精品咖啡評鑑批次 Specialty Coffee Evaluation of...,大鋤花間 (hoe vs. flower coffee farm),Taiwan,taiwan coffee laboratory,650,...,Green,0,"June 1st, 2018",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,650.0000,650.0000,650.0000
56,57,Arabica,ibrahim hussien speciality coffee producer &ex...,Ethiopia,,,burka gudina,2014/2015,ibrahim hussien specality coffee product & exp...,1800-2000,...,Green,4,"April 3rd, 2016",METAD Agricultural Development plc,309fcf77415a3661ae83e027f7e5f05dad786e44,19fef5a731de2db57d16da10287413f5f99bc2dd,m,1800.0000,2000.0000,1900.0000


## 2. Stratified Sampling by Country

In [20]:
df.groupby('Country.of.Origin', group_keys=False).apply(lambda x: x.sample(min(5, len(x))))

  df.groupby('Country.of.Origin', group_keys=False).apply(lambda x: x.sample(min(5, len(x))))


Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Color,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters
41,42,Arabica,jacques pereira carneiro,Brazil,pereira estate coffee,,cocarive,002/1352/0045,,1.2,...,,2,"January 4th, 2012",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,12.0,12.0,12.0
765,766,Arabica,bourbon specialty coffees,Brazil,,,,002/4542/0274,bourbon specialty coffees,,...,Green,1,"January 15th, 2016",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,,,
713,714,Arabica,bourbon specialty coffees,Brazil,,,,002/4542/0478,bourbon specialty coffees,,...,Green,10,"April 19th, 2016",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,,,
1114,1115,Arabica,brayan cunha souza,Brazil,,BR5691,cafeco 3,002/1352/0226,carmo coffees,1100,...,Green,5,"November 29th, 2018",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,1100.0,1100.0,1100.0
681,682,Arabica,bourbon specialty coffees,Brazil,,43102245 - P4615,dinamo armazens gerais ltda,002/4542/0190,bourbon specialty coffees,,...,Green,0,"January 11th, 2017",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
554,555,Arabica,royal base corporation,Vietnam,"apollo co., ltd.",,"apollo co., ltd.",,royal base corporation,1040m,...,,17,"July 20th, 2013",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1040.0,1040.0,1040.0
790,791,Arabica,royal base corporation,Vietnam,"apollo co., ltd.",,"apollo co., ltd.",,royal base corporation,1040m,...,Green,2,"July 23rd, 2013",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1040.0,1040.0,1040.0
860,861,Arabica,"sunvirtue co., ltd.",Vietnam,apollo estate,,apollo estate,,"sunvirtue co., ltd.",1040,...,Bluish-Green,0,"December 20th, 2016",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1040.0,1040.0,1040.0
444,445,Arabica,"sunvirtue co., ltd.",Vietnam,apollo estate,Oriental Paris Natural Coffee,yes,,"sunvirtue co., ltd.",1550,...,,0,"May 8th, 2018",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1550.0,1550.0,1550.0


## 3. Bootstrap Sample (With Replacement)

In [21]:
df.sample(n=len(df), replace=True, random_state=42)

Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Color,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters
1126,1127,Arabica,doi tung development project,Thailand,doi tung development project,,doi tung development project,10589010,doi tung development project,800++,...,Green,0,"May 29th, 2014",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,800.0,800.0,800.0
860,861,Arabica,"sunvirtue co., ltd.",Vietnam,apollo estate,,apollo estate,,"sunvirtue co., ltd.",1040,...,Bluish-Green,0,"December 20th, 2016",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1040.0,1040.0,1040.0
1294,1295,Arabica,kurt kappeli,Mexico,various,,u.c.i.r.i.,0016-2722-0001,upctiz zapoteca s.p.r. de r.l.,1200 meters,...,,1,"May 8th, 2015",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1200.0,1200.0,1200.0
1130,1131,Arabica,federacion nacional de cafeteros,Colombia,,,,03-01-2700,federacion nacional de cafeteros,,...,Green,1,"April 22nd, 2016",Almacafé,e493c36c2d076bf273064f7ac23ad562af257a25,70d3c0c26f89e00fdae6fb39ff54f0d2eb1c38ab,m,,,
1095,1096,Arabica,bismarck castro,Honduras,las moras,102,cigrah s.a de c.v.,13-111-053,cigrah s.a de c.v,1400,...,Green,4,"April 6th, 2018",Instituto Hondureño del Café,b4660a57e9f8cc613ae5b8f02bfce8634c763ab4,7f521ca403540f81ec99daec7da19c2788393880,m,1400.0,1400.0,1400.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
710,711,Arabica,rodrigo soto,Costa Rica,rio jorco,Tarrazu,rio jorco,5/423/0127,panamerican coffee trading,1550,...,Green,0,"May 17th, 2017",Specialty Coffee Association of Costa Rica,8e0b118f3cf3121ab27c5387deacdb7d4d2a60b1,5eb2b7129d9714c43825e44dc3bca9423de209e9,m,1550.0,1550.0,1550.0
400,401,Arabica,andreas kussmaul,Mexico,ecc,,ecc,1620280402,exportadora café california,1200,...,Green,2,"July 13th, 2016",Asociación Mexicana De Cafés y Cafeterías De E...,3441698871fa609a44ce947e8944ee42eb4428b9,9894541e8065ee718165a1d432389d114defc38c,m,1200.0,1200.0,1200.0
980,981,Arabica,eric thormaehlen,Costa Rica,various,,coricafe sa,5-0048-0126,coricafe s.a.,1200-1400,...,Green,3,"May 9th, 2015",Specialty Coffee Association of Costa Rica,8e0b118f3cf3121ab27c5387deacdb7d4d2a60b1,5eb2b7129d9714c43825e44dc3bca9423de209e9,m,1200.0,1400.0,1300.0
1109,1110,Arabica,pablo enrique martinez gama,Mexico,"la orduña, coatepec, veracruz",,alcafe s.a. de c.v,1104362940,café katsina,1250,...,Green,2,"October 9th, 2013",AMECAFE,59e396ad6e22a1c22b248f958e1da2bd8af85272,0eb4ee5b3f47b20b049548a2fd1e7d4a2b70d0a7,m,1250.0,1250.0,1250.0


## 4. Train/Test Split (80/20)

In [22]:
train = df.sample(frac=0.8, random_state=0)
test = df.drop(train.index)


## 5. Jackknife Sample (Leave-One-Out)

In [23]:
jackknife_samples = [df.drop(i) for i in range(5)]
jackknife_samples

[      Unnamed: 0  Species                       Owner Country.of.Origin  \
 1              2  Arabica                   metad plc          Ethiopia   
 2              3  Arabica    grounds for health admin         Guatemala   
 3              4  Arabica         yidnekachew dabessa          Ethiopia   
 4              5  Arabica                   metad plc          Ethiopia   
 5              6  Arabica                   ji-ae ahn            Brazil   
 ...          ...      ...                         ...               ...   
 1306        1307  Arabica    juan carlos garcia lopez            Mexico   
 1307        1308  Arabica     myriam kaplan-pasternak             Haiti   
 1308        1309  Arabica  exportadora atlantic, s.a.         Nicaragua   
 1309        1310  Arabica   juan luis alvarado romero         Guatemala   
 1310        1312  Arabica             bismarck castro          Honduras   
 
                                      Farm.Name                  Lot.Number  \
 1     

## 6. Group-Wise Bootstrap Sampling

In [24]:
df.groupby('Processing.Method', group_keys=False).apply(lambda x: x.sample(n=len(x), replace=True))

  df.groupby('Processing.Method', group_keys=False).apply(lambda x: x.sample(n=len(x), replace=True))


Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Color,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters
706,707,Arabica,pedro santos e silva,Brazil,fazenda são pedro,0063/17,alfenas,,olam,982,...,Green,11,"February 9th, 2018",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,982.0,982.0,982.0
490,491,Arabica,kona pacific farmers cooperative,United States (Hawaii),,,,220454,kona pacific farmers cooperative,,...,,4,"April 6th, 2013",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,ft,,,
838,839,Arabica,ipanema coffees,Brazil,rio verde,,ipanema comercial e exportadora sa,002/4177/0150,ipanema coffees,1,...,Green,1,"October 7th, 2016",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,1.0,1.0,1.0
790,791,Arabica,royal base corporation,Vietnam,"apollo co., ltd.",,"apollo co., ltd.",,royal base corporation,1040m,...,Green,2,"July 23rd, 2013",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1040.0,1040.0,1040.0
270,271,Arabica,compañia colombiana agroindustrial s.a,Colombia,,,trilladora boananza,3-79-0635,ecom cca sa,1550,...,Green,3,"January 17th, 2015",Almacafé,e493c36c2d076bf273064f7ac23ad562af257a25,70d3c0c26f89e00fdae6fb39ff54f0d2eb1c38ab,m,1550.0,1550.0,1550.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1300,1301,Arabica,ricardo aaron sampieri marini,Mexico,la morena,,"tlamatoca, hutusco, ver.",1104351023,,1800,...,Green,0,"July 11th, 2013",AMECAFE,59e396ad6e22a1c22b248f958e1da2bd8af85272,0eb4ee5b3f47b20b049548a2fd1e7d4a2b70d0a7,m,1800.0,1800.0,1800.0
96,97,Arabica,kyagalanyi coffee ltd,Uganda,mount elgon area,6133,kyagalanyi coffee ltd,,kyagalanyi coffee ltd,1800,...,Green,1,"July 24th, 2018",Uganda Coffee Development Authority,188fe373b511e21f614564bf86aa4774270d8e04,b7614767a5343729bbde3a2777c60ce836aed928,m,1800.0,1800.0,1800.0
898,899,Arabica,"ceca, s.a.",Costa Rica,cafetalera aquiares,,cafetalera aquiares,5-025-024/25,"ceca,s.a.",1.3,...,Green,1,"February 3rd, 2016",Specialty Coffee Association of Costa Rica,8e0b118f3cf3121ab27c5387deacdb7d4d2a60b1,5eb2b7129d9714c43825e44dc3bca9423de209e9,m,13.0,13.0,13.0
849,850,Arabica,kurt kappeli,Mexico,various,,cafe gourmet de sierra azul sc,0016-2814-0002,globus coffee,1550 meters,...,Green,0,"April 26th, 2015",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1550.0,1550.0,1550.0


## 7. Weighted Sampling by Total Cup Points

In [25]:
df.sample(n=100, weights='Total.Cup.Points', random_state=1)

Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Color,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters
532,533,Arabica,doi tung development project,Thailand,doi tung development project,,,,,,...,,0,"April 13th, 2011",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,,,
929,930,Arabica,specialty coffee association of indonesia,Indonesia,"ijen highland, east java",,pt. perkebunan nusantara xii,na,scai,1200-1600masl,...,Green,0,"May 24th, 2013",Specialty Coffee Association of Indonesia,99fa73db21b7acd9c9ceb9dd84e409d2077d55c4,36910838db193ebdd61fa1427bac74622114c49a,m,1200.0,1600.0,1400.0
0,1,Arabica,metad plc,Ethiopia,metad plc,,metad plc,2014/2015,metad agricultural developmet plc,1950-2200,...,Green,0,"April 3rd, 2016",METAD Agricultural Development plc,309fcf77415a3661ae83e027f7e5f05dad786e44,19fef5a731de2db57d16da10287413f5f99bc2dd,m,1950.0,2200.0,2075.0
384,385,Arabica,bourbon specialty coffees,Brazil,cachoeira da grama farm,,,002/4542/0886,bourbon specialty coffees,,...,Green,6,"January 15th, 2016",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,,,
184,185,Arabica,carcafe ltda ci,Colombia,,,neiva,3-59-2235,carcafe ltda,442 msnm,...,Green,0,"April 28th, 2016",Almacafé,e493c36c2d076bf273064f7ac23ad562af257a25,70d3c0c26f89e00fdae6fb39ff54f0d2eb1c38ab,m,442.0,442.0,442.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1150,1151,Arabica,fredy gordillo reyes,Mexico,union ramal santa cruz,,union ramal santa cruz,2496,union ramal santa cruz spr de ri,1400,...,Green,4,"March 29th, 2014",AMECAFE,59e396ad6e22a1c22b248f958e1da2bd8af85272,0eb4ee5b3f47b20b049548a2fd1e7d4a2b70d0a7,m,1400.0,1400.0,1400.0
460,461,Arabica,juan luis alvarado romero,Guatemala,nueva granada,,beneficio nueva granada,11-326-10,"agricola nueva granada, s.a.",5000 pies,...,Green,1,"March 27th, 2015",Asociacion Nacional Del Café,b1f20fe3a819fd6b2ee0eb8fdc3da256604f1e53,724f04ad10ed31dbb9d260f0dfd221ba48be8a95,ft,1524.0,1524.0,1524.0
1183,1184,Arabica,marco virgilio ramirez teliz,Mexico,el aguacate,,cafes de naranjal s.a. de c.v.,1104367469,cafes de naranjal s.a. de c.v,1000,...,Green,10,"September 10th, 2013",AMECAFE,59e396ad6e22a1c22b248f958e1da2bd8af85272,0eb4ee5b3f47b20b049548a2fd1e7d4a2b70d0a7,m,1000.0,1000.0,1000.0
806,807,Arabica,"lin, che-hao krude 林哲豪",Taiwan,gao chun fang 高醇坊,,gao chun fang 高醇坊,Taiwan,"red on tree co., ltd.",600-700 m,...,Bluish-Green,0,"June 3rd, 2014",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,600.0,700.0,650.0


## 8. Resample Time-Like Index

In [26]:
df['fake_date'] = pd.date_range(start='2020-01-01', periods=len(df), freq='D')
df.set_index('fake_date').select_dtypes(include='number').resample('M').mean().head()


  df.set_index('fake_date').select_dtypes(include='number').resample('M').mean().head()


Unnamed: 0_level_0,Unnamed: 0,Number.of.Bags,Aroma,Flavor,Aftertaste,Acidity,Body,Balance,Uniformity,Clean.Cup,Sweetness,Cupper.Points,Total.Cup.Points,Moisture,Category.One.Defects,Quakers,Category.Two.Defects,altitude_low_meters,altitude_high_meters,altitude_mean_meters
fake_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2020-01-31,16.0,139.258065,8.264194,8.39,8.236452,8.299355,8.163548,8.226452,9.935161,10.0,9.913548,8.435161,87.862903,0.064516,0.0,0.0,1.225806,1640.624,1752.424,1696.524
2020-02-29,46.0,131.344828,8.057241,8.066552,7.936552,8.056897,7.895517,7.959655,10.0,10.0,10.0,8.104138,86.077586,0.073448,0.206897,0.206897,1.931034,1304.865455,1442.592727,1373.729091
2020-03-31,76.0,172.516129,7.975484,7.943226,7.806129,7.964839,7.870645,7.953871,9.956774,9.978387,9.978387,7.951613,85.380968,0.078065,0.387097,0.096774,3.225806,1602.786667,1631.902,1617.344333
2020-04-30,106.5,109.9,7.839333,7.911667,7.802333,7.863667,7.822667,7.835667,9.977667,9.977667,10.0,7.921667,84.952667,0.070667,0.3,0.4,3.0,1302.69,1438.598,1370.644
2020-05-31,137.0,145.709677,7.849677,7.791613,7.725161,7.822581,7.733871,7.854839,10.0,10.0,9.956774,7.92,84.657097,0.087742,0.032258,0.032258,2.612903,1430.422857,1473.28,1451.851429


## 9. Upsampling Small Groups

In [27]:
df.groupby('Processing.Method', group_keys=False).apply(lambda x: x.sample(n=100, replace=True))

  df.groupby('Processing.Method', group_keys=False).apply(lambda x: x.sample(n=100, replace=True))


Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters,fake_date
716,717,Arabica,ipanema coffees,Brazil,rio verde,,ipanema coffees,002/1660/0065,ipanema coffees,1100,...,0,"October 15th, 2015",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1100.0,1100.0,1100.0,2021-12-17
482,483,Arabica,"lin, che-hao krude 林哲豪",Taiwan,kan tou mountain coffee 崁頭山咖啡館,,kan tou mountain coffee 崁頭山咖啡館,Taiwan,taiwan coffee laboratory,700-800m,...,0,"May 18th, 2016",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,700.0,800.0,750.0,2021-04-27
848,849,Arabica,,Honduras,gran manzana y el aguacate,,cigrah sps,13-111-311,cigrah,1350,...,4,"May 16th, 2015",Instituto Hondureño del Café,b4660a57e9f8cc613ae5b8f02bfce8634c763ab4,7f521ca403540f81ec99daec7da19c2788393880,m,1350.0,1350.0,1350.0,2022-04-28
1257,1258,Arabica,eileen koyanagi,United States (Hawaii),,,kona pacific farmers cooperative,Specialty Coffee Association of America,kona pacific farmers cooperative,,...,5,"January 27th, 2015",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,ft,,,,2023-06-11
1307,1308,Arabica,myriam kaplan-pasternak,Haiti,200 farms,,coeb koperativ ekselsyo basen (350 members),,haiti coffee,~350m,...,16,"May 24th, 2013",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,350.0,350.0,350.0,2023-07-31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
950,951,Arabica,racafe & cia s.c.a,Colombia,,3-37-0059,bachue,3-37-0059,racafe & cia s.c.a,,...,0,"November 8th, 2017",Almacafé,e493c36c2d076bf273064f7ac23ad562af257a25,70d3c0c26f89e00fdae6fb39ff54f0d2eb1c38ab,m,,,,2022-08-08
449,450,Arabica,consejo salvadoreño del café,El Salvador,agua caliente,1,agua caliente,9-392-68,consejo salvadoreño del café,1350,...,0,"March 29th, 2017",Salvadoran Coffee Council,3d4987e3b91399dbb3938b5bdf53893b6ef45be1,27b21e368fb8291cbea02c60623fe6c98f84524d,m,1350.0,1350.0,1350.0,2021-03-25
565,566,Arabica,juan luis alvarado romero,Guatemala,el papaturro,,beneficio ixchel,11/23/0578,"unex guatemala, s.a.",4000psn,...,2,"July 19th, 2016",Asociacion Nacional Del Café,b1f20fe3a819fd6b2ee0eb8fdc3da256604f1e53,724f04ad10ed31dbb9d260f0dfd221ba48be8a95,ft,1219.2,1219.2,1219.2,2021-07-19
752,753,Arabica,exportadora de cafe condor s.a,Colombia,,,trilladora boananza,3-68-0088,exportadora de cafe condor s.a,1750,...,1,"February 21st, 2013",Almacafé,e493c36c2d076bf273064f7ac23ad562af257a25,70d3c0c26f89e00fdae6fb39ff54f0d2eb1c38ab,m,1750.0,1750.0,1750.0,2022-01-22


## 10. Downsampling to the Minimum Group Size


In [28]:
min_n = df['Processing.Method'].value_counts().min()
df.groupby('Processing.Method', group_keys=False).apply(lambda x: x.sample(n=min_n))

  df.groupby('Processing.Method', group_keys=False).apply(lambda x: x.sample(n=min_n))


Unnamed: 0.1,Unnamed: 0,Species,Owner,Country.of.Origin,Farm.Name,Lot.Number,Mill,ICO.Number,Company,Altitude,...,Category.Two.Defects,Expiration,Certification.Body,Certification.Address,Certification.Contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters,fake_date
405,406,Arabica,eileen koyanagi,United States (Hawaii),,,,KP012314,kona pacific farmers cooperative,,...,1,"February 25th, 2015",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,ft,,,,2021-02-09
1095,1096,Arabica,bismarck castro,Honduras,las moras,102,cigrah s.a de c.v.,13-111-053,cigrah s.a de c.v,1400,...,4,"April 6th, 2018",Instituto Hondureño del Café,b4660a57e9f8cc613ae5b8f02bfce8634c763ab4,7f521ca403540f81ec99daec7da19c2788393880,m,1400.0,1400.0,1400.0,2022-12-31
936,937,Arabica,cafebras,Brazil,fazenda rio brilhante,,17/18,002/1495/0134,cafebras comercio de cafés do brasil sa,1100,...,2,"December 2nd, 2016",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,1100.0,1100.0,1100.0,2022-07-25
261,262,Arabica,ipanema coffees,Brazil,capoeirinha,007/16E,dry mill,002/1660/0105,ipanema coffees,934,...,7,"August 16th, 2017",Brazil Specialty Coffee Association,3297cfa4c538e3dd03f72cc4082c54f7999e1f9d,8900f0bf1d0b2bafe6807a73562c7677d57eb980,m,934.0,934.0,934.0,2020-09-18
807,808,Arabica,owen carver,Brazil,café do paraíso,,café do paraíso,??,café do paraíso,894m - 1183m,...,0,"May 21st, 2014",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,894.0,1183.0,1038.5,2022-03-18
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
520,521,Arabica,alfredo bojalil,Mexico,las lomas,,,2222,ecomtrading,1200,...,8,"July 11th, 2013",AMECAFE,59e396ad6e22a1c22b248f958e1da2bd8af85272,0eb4ee5b3f47b20b049548a2fd1e7d4a2b70d0a7,m,1200.0,1200.0,1200.0,2021-06-04
255,256,Arabica,taylor winch (coffee) ltd.,Kenya,-,,,37-0569-2347,taylor winch (coffee) ltd,1650,...,0,"May 31st, 2013",Kenya Coffee Traders Association,ccba45b89d859740b749878be8c6d16fbdb96c2e,d752c909a015f3c76224b3c5cc520f8a67afda74,m,1650.0,1650.0,1650.0,2020-09-12
1021,1022,Arabica,cqi taiwan icp cqi台灣合作夥伴,Taiwan,張文進莊園,,張文進莊園,Taiwan台灣,宸嶧國際,550公尺,...,0,"November 23rd, 2015",Blossom Valley International,fc45352eee499d8470cf94c9827922fb745bf815,de73fc9412358b523d3a641501e542f31d2668b0,m,550.0,550.0,550.0,2022-10-18
1262,1263,Arabica,juan luis alvarado romero,Guatemala,conquista / morito,,beneficio ixchel,11/23/01,asociación nacional del café - anacafe -,,...,4,"January 2nd, 2013",Asociacion Nacional Del Café,b1f20fe3a819fd6b2ee0eb8fdc3da256604f1e53,724f04ad10ed31dbb9d260f0dfd221ba48be8a95,m,,,,2023-06-16
