# MICS Download

This notebook will call the MICS_module and use the functions to download and sort new MICS data.

### Instructions 
1. When a new MICS survey is avaialable, go to mics survey site (https://mics.unicef.org/surveys). Click the upload button and select CSV next to the grayed-out "DOWNLOAD MICS DATASETS" button to get the most up to date metadata.

2. Select the correct survey round, the appropriate region and select MICS as the datatype. Click "DOWNLOAD MICS DATASETS".

    **NOTE** The MICS_module is made to sort bulk mics datasets. If you download only one dataset from a country this function will not work and it would be best to manually sort that into the appropriate folder

3. Import the MICS_module into you notebook.

4. Use MICS_module to process the downloaded metadata csv from step 1:

    ```{python}
    metadata = MICS_module.process_mics_metadata('path_to_metadata', 'MICS/ISO3_country_codes.csv')
    ``` 

5.  Check for missing values in metadata and fill in as needed. There shouldn't be any missing values right now, but as country names change, there might be.

    The MICS metadata is organized as follows:  
    - round: The MICS round (e.g., "MICS6").
    - round_num: The numeric representation of the MICS round (e.g., 6).
    - country_x: The original country name from the MICS metadata.
    - country_code: The ISO3 country code.
    - year: The year of the survey.
    - save_name: The standardized country name used for saving files.
    - standardized: The standardized country name for merging and sorting.
    
    

  
6. Use MICS_module to sort the zipped datasets downloaded in step 2:  

    ` MICS_module.extract_and_save_zipped_files('log/file/path.txt', 'zip/file/path/', mics_metadata, survey_round_number) `
   

  
7. Use MICS_module to look at possible errors  

    `log = MICS_module.parse_log_to_df('log/file/path.txt')`    
    
    
    Sort to see failures and where manual checks are advised:
    
    #view failures  
    `log[log['success'] == False]`  
    #view files to manually check  
    `log[log[manual_check_advised].notnull()]`
    
    To check full specific failure reasons : `log[log['success'] == False]['failure_reason'][row_num]`
    
    It is also advisable to check for missing data in the success columns as unexpected edge cases can cause issues there.
    
    The log dataframe has the following columns:
    - zip_file: The name of the zip file being processed.
    - normalized_country: The normalized country name extracted from the log.
    - standardized_country: The standardized country name after matching with metadata.
    - metadata_row: The metadata row associated with the country.
    - extracted_country: The country name extracted from the file.
    - metadata_rows_found: The number of metadata rows found for the country.
    - available_years: The years available for the dataset, extracted from the log.
    - unzipping_to: The directory where the zip file was extracted.
    - saved_to: The directory where the processed files were saved.
    - success: A boolean indicating whether the processing was successful.
    - failure_reason: A description of the reason for failure, if applicable.
    - manual_check_advised: Notes indicating if manual intervention is required.

    
    `

In [None]:
#!pip install pycountry
#!pip install unidecode
#!pip install rapidfuzz

Defaulting to user installation because normal site-packages is not writeable
Collecting rapidfuzz
  Downloading rapidfuzz-3.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading rapidfuzz-3.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m56.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: rapidfuzz
Successfully installed rapidfuzz-3.13.0


In [4]:
#import mics module
import MICS_module

In [2]:
#process mics metadata
metadata = MICS_module.process_mics_metadata('MICS/mics_surveys_catalogue.csv', 'MICS/ISO3_country_codes.csv')

In [6]:
#view metadata
metadata.head()

Unnamed: 0,round,round_num,country_x,country_code,year,save_name,standardized
0,MICS7,7,Viet Nam,VNM,2027,Vietnam,Viet Nam
1,MICS7,7,Armenia,ARM,2026,Armenia,Armenia
2,MICS7,7,Tuvalu,TUV,2026,Tuvalu,Tuvalu
3,MICS7,7,Fiji,FJI,2026,Fiji,Fiji
4,MICS7,7,Kiribati,KIR,2026,Kiribati,Kiribati


In [5]:
#extract and sort data
MICS_module.extract_and_save_zipped_files('MICS/MICS_error_logs/test_log.txt', 'MICS/MICS_zip/MICS_Datasets (6).zip', metadata, 6)

In [7]:
#parse error log
log = MICS_module.parse_log_to_df('MICS/MICS_error_logs/test_log.txt')

In [11]:
#view failures  
log[log['success'] == False]['failure_reason'][2]

"[Errno 2] No such file or directory: '../individual_country_data/BGD_Bangladesh'"

In [12]:
#view files to manually check  
log[log['manual_check_advised'].notnull()]

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised
2,Bangladesh MICS6 SPSS Datasets.zip,bangladesh,Bangladesh,Bangladesh,Bangladesh,1,['2019'],,,False,[Errno 2] No such file or directory: '../indiv...,Manual check advised
29,Thailand MICS6 and Thailand Selected 17 Provin...,thailand,Thailand,Thailand,Thailand,2,['2022' '2019'],THA2019MC6,,False,Destination path '../individual_country_data/T...,Manual check advised


In [14]:
log[log['success'] == 'territory']

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised
0,Pakistan Sindh MICS6 Datasets.zip,,,,,,,,,territory,,
1,Pakistan (Balochistan) MICS6 Datasets.zip,,,,,,,,,territory,,
11,Montenegro (Roma Settlements) MICS6 Datasets.zip,,,,,,,,,territory,,
37,Republic of North Macedonia (Roma Settlements)...,,,,,,,,,territory,,
41,"Kosovo (UNSCR 1244) (Roma, Ashkali and Egyptia...",,,,,,,,,territory,,
48,Pakistan Khyber Pakhtunkhwa MICS6 Datasets.zip,,,,,,,,,territory,,
54,Serbia (Roma Settlements) MICS6 Datasets.zip,,,,,,,,,territory,,
66,Pakistan Punjab MICS6 Datasets.zip,,,,,,,,,territory,,


In [13]:
log

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised
0,Pakistan Sindh MICS6 Datasets.zip,,,,,,,,,territory,,
1,Pakistan (Balochistan) MICS6 Datasets.zip,,,,,,,,,territory,,
2,Bangladesh MICS6 SPSS Datasets.zip,bangladesh,Bangladesh,Bangladesh,Bangladesh,1,['2019'],,,False,[Errno 2] No such file or directory: '../indiv...,Manual check advised
3,Dominican Republic MICS6 Datasets.zip,dominican republic,Dominican Republic,Dominican Republic,Dominican Republic,1,['2019'],DOM2019MC6,/individual_country_data/DOM_Dominican_Republi...,True,,
4,Viet Nam MICS6 Datasets.zip,viet nam,Viet Nam,Vietnam,Viet Nam,1,['2020-2021'],VNM2021MC6,/individual_country_data/VNM_Vietnam/03_Survey...,True,,
...,...,...,...,...,...,...,...,...,...,...,...,...
63,Madagascar MICS6 datasets.zip,madagascar,Madagascar,Madagascar,Madagascar,1,['2018'],MDG2018MC6,/individual_country_data/MDG_Madagascar/03_Sur...,True,,
64,Azerbaijan MICS6 2023 Datasets.zip,azerbaijan,Azerbaijan,Azerbaijan,Azerbaijan,1,['2023'],AZE2023MC6,/individual_country_data/AZE_Azerbaijan/03_Sur...,True,,
65,Nepal MICS6 Datasets.zip,nepal,Nepal,Nepal,Nepal,1,['2019'],NPL2019MC6,/individual_country_data/NPL_Nepal/03_Survey_d...,True,,
66,Pakistan Punjab MICS6 Datasets.zip,,,,,,,,,territory,,


'2006'