# MICS Download
This notebook will call the MICS_module and use the functions to download and sort new MICS data.

# Instructions for Individual Survey Download

1. When a new MICS survey is avaialable, go to mics survey site MICS survey site. Click the upload button and select CSV next to the grayed-out "DOWNLOAD MICS DATASETS" button to get the most up to date metadata.

2. Select the correct survey round, the appropriate country and select MICS as the datatype. Download the survey set.

3. If avaialable download the GIS data as well. 

4. Import the `MICS_module` into you notebook.

    
    **NOTE** The `MICS_module` is made to sort *bulk* mics datasets. If you download only one dataset from a country this function will not work and it would be best to manually sort that into the appropriate folder

    ```python
    import MICS_module
    ``` 

5. Use `MICS_module` to process the downloaded metadata csv from step 3:

    ```python
    metadata = MICS_module.process_mics_metadata('path_to_metadata', 'MICS/ISO3_country_codes.csv')
    ``` 

6.  Check for missing values in metadata and fill in as needed. There shouldn't be any missing values right now, but as country names change, there might be.
    
    The MICS metadata is organized as follows: 
    - `round`: The MICS round (e.g., `"MICS6"`).
    - `round_num`: The numeric representation of the MICS round (e.g., `6`).
    - `country_x`: The original country name from the MICS metadata.
    - `country_code`: The ISO3 country code.
    - `year`: The year of the survey.
    - `save_name`: The standardized country name used for saving files.
    - `standardized`: The standardized country name for merging and sorting.

7. Use `MICS_module` to extract and sort the downloaded zip file.

    ```python
    MICS_module.extract_single_zipped_survey('zip/file/path.zip', mics_metadata, survey_round_number)
    ```

8. Check printed output log to make sure no errors occured.

# Instructions for Bulk Download

1. When a new MICS survey is avaialable, go to mics survey site [MICS survey site](https://mics.unicef.org/surveys). Click the upload button and select CSV next to the grayed-out "DOWNLOAD MICS DATASETS" button to get the most up to date metadata.

2. Select the correct survey round, the appropriate region and select **MICS** as the datatype. Click "DOWNLOAD MICS DATASETS".



3. Import the `MICS_module` into you notebook.

    
    **NOTE** The `MICS_module` is made to sort *bulk* mics datasets. If you download only one dataset from a country this function will not work and it would be best to manually sort that into the appropriate folder

    ```python
    import MICS_module
    ``` 


4. Use `MICS_module` to process the downloaded metadata csv from step 3:

    ```python
    metadata = MICS_module.process_mics_metadata('path_to_metadata', 'MICS/ISO3_country_codes.csv')
    ``` 

5.  Check for missing values in metadata and fill in as needed. There shouldn't be any missing values right now, but as country names change, there might be.
    
    The MICS metadata is organized as follows: 
    - `round`: The MICS round (e.g., `"MICS6"`).
    - `round_num`: The numeric representation of the MICS round (e.g., `6`).
    - `country_x`: The original country name from the MICS metadata.
    - `country_code`: The ISO3 country code.
    - `year`: The year of the survey.
    - `save_name`: The standardized country name used for saving files.
    - `standardized`: The standardized country name for merging and sorting.


6. Use `MICS_module` to sort the zipped datasets downloaded in step 2:  

    ```python 
    MICS_module.extract_and_save_zipped_files('log/file/path.txt', 'zip/file/path/', mics_metadata, survey_round_number) 
    ```

7. Use `MICS_module` to look at possible errors  
    
    ```python
    log = MICS_module.parse_log_to_df('log/file/path.txt')
    ```

    Sort to see failures and where manual checks are advised:
    
    ```python
    log[(log['manual_check_advised'].notnull()) | (log['success'] == False)]
    ```
    
    To check full specific failure reasons : 
    
    ```python 
    log[log['success'] == False]['failure_reason'][row_num]
    ```
    
    It is also advisable to check for missing data in the success columns as unexpected edge cases can cause issues there.
    
    The log dataframe has the following columns: 
                
    - `zip_file`: The name of the zip file being processed.
    - `normalized_country`: The normalized country name extracted from the log.
    - `standardized_country`: The standardized country name after matching with metadata.
    - `metadata_row`: The metadata row associated with the country.
    - `extracted_country`: The country name extracted from the file.
    - `metadata_rows_found`: The number of metadata rows found for the country.
    - `available_years`: The years available for the dataset, extracted from the log.
    - `unzipping_to`: The directory where the zip file was extracted.
    - `saved_to`: The directory where the processed files were saved.
    - `success`: A boolean indicating whether the processing was successful.
    - `failure_reason`: A description of the reason for failure, if applicable.
    - `manual_check_advised`: Notes indicating if manual intervention is required. This can be one of:
        - NaN
        - "Manual check advised"
        - "Fuzzy match used" — in this case, check the `saved_to` output manually to ensure correctness.

# Code Runthrough For MICS Rounds 2-6

## Install packages and Import MICS_module

In [None]:
#!pip install pycountry
#!pip install unidecode
#!pip install rapidfuzz

Defaulting to user installation because normal site-packages is not writeable
Collecting rapidfuzz
  Downloading rapidfuzz-3.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading rapidfuzz-3.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m56.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: rapidfuzz
Successfully installed rapidfuzz-3.13.0


In [1]:
#import mics module
import MICS_module

## Process Metadata

In [2]:
#process mics metadata
metadata = MICS_module.process_mics_metadata('MICS/mics_surveys_catalogue.csv', 'MICS/ISO3_country_codes.csv')

In [3]:
#view metadata
metadata.head()

Unnamed: 0,round,round_num,country_x,country_code,year,save_name,standardized
0,MICS7,7,Viet Nam,VNM,2027,Vietnam,Viet Nam
1,MICS7,7,Armenia,ARM,2026,Armenia,Armenia
2,MICS7,7,Tuvalu,TUV,2026,Tuvalu,Tuvalu
3,MICS7,7,Fiji,FJI,2026,Fiji,Fiji
4,MICS7,7,Kiribati,KIR,2026,Kiribati,Kiribati


## Run MICS Extract and Save Files

### MICS Round 6

#### Extract and Sort

In [4]:
#extract and sort data
MICS_module.extract_and_save_zipped_files('MICS/MICS_error_logs/log6.txt', 'MICS/MICS_zip/MICS_Datasets (6).zip', metadata, 6)

#### Examine Error Log

In [5]:
#parse error log
log6 = MICS_module.parse_log_to_df('MICS/MICS_error_logs/log6.txt')

In [8]:
#view files to manually check and files that failed to download
log6[(log6['manual_check_advised'].notnull()) | (log6['success'] == False)]

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised
29,Thailand MICS6 and Thailand Selected 17 Provin...,thailand,Thailand,Thailand,Thailand,2,['2022' '2019'],THA2019MC6,,False,Destination path '../Individual_country_data/T...,Manual check advised


This failure was not a full failure. The Thailand zip file was organized strangely, in a way the code to flatten the file structure couldn't fully handle.
The "Thailand MICS6 and Thailand Selected 17 Provinces.zip" file had 2 nested zip file inside containing the province specific data and the country wide data which caused this issue. After one sub file was unzipped, the attempt to unzip the remaining sub file and save it in the same folder gave the error that the file path already exists. The province data has been removed and this error can be considered taken care of. 

### MICS Round 5

#### Extract and Sort

In [9]:
#extract and sort data
MICS_module.extract_and_save_zipped_files('MICS/MICS_error_logs/log5.txt', 'MICS/MICS_zip/MICS_Datasets (5).zip', metadata, 5)

#### Examine Error Log

In [10]:
#parse error log
log5 = MICS_module.parse_log_to_df('MICS/MICS_error_logs/log5.txt')

In [13]:
#view files to manually check and files that failed do download 
log5[(log5['manual_check_advised'].notnull()) | (log5['success'] == False)]

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised


### MICS Round 4

#### Extract and Sort

In [14]:
#extract and sort data
MICS_module.extract_and_save_zipped_files('MICS/MICS_error_logs/log4.txt', 'MICS/MICS_zip/MICS_Datasets (4).zip', metadata, 4)

#### Examine Error Log

In [15]:
#parse error log
log4 = MICS_module.parse_log_to_df('MICS/MICS_error_logs/log4.txt')

In [16]:
#view files to manually check and files that failed do download 
log4[(log4['manual_check_advised'].notnull()) | (log4['success'] == False)]

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised


### MICS Round 3

#### Extract and Sort

In [17]:
#extract and sort data
MICS_module.extract_and_save_zipped_files('MICS/MICS_error_logs/log3.txt', 'MICS/MICS_zip/MICS_Datasets (3).zip', metadata, 3)

#### Examine Error Log

In [18]:
#parse error log
log3 = MICS_module.parse_log_to_df('MICS/MICS_error_logs/log3.txt')

In [19]:
#view files to manually check and files that failed do download 
log3[(log3['manual_check_advised'].notnull()) | (log3['success'] == False)]

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised


### MICS Round 2

#### Extract and Sort

In [20]:
#extract and sort data
MICS_module.extract_and_save_zipped_files('MICS/MICS_error_logs/log2.txt', 'MICS/MICS_zip/MICS_Datasets (2).zip', metadata, 2)

#### Examine Error Log

In [21]:
#parse error log
log2 = MICS_module.parse_log_to_df('MICS/MICS_error_logs/log2.txt')

In [22]:
#view files to manually check and files that failed do download 
log2[(log2['manual_check_advised'].notnull()) | (log2['success'] == False)]

Unnamed: 0,zip_file,normalized_country,standardized_country,metadata_row,extracted_country,metadata_rows_found,available_years,unzipping_to,saved_to,success,failure_reason,manual_check_advised
41,Sao Tome and Principle 2000 MICS_Datasets.zip,sao tome and principle,error,,error,1,['2000'],STP2000MC2,/Individual_country_data/STP_Sao_Tome_and_Prin...,True,,Fuzzy match used


This case, has manual check advised because a fuzzy match was used. That means because of a spelling error in the country name in the file, the country could not be perfectly matched and had to be matched on similarities. Since it was unzipped to STP_Sao_Tome_and_Principe, it seems the match was correct. No action was needed