<em>In this notebook it will be explained how to get and plot the data from the USGS Earthquake Catalog. 

The USGS Earthquake Catalog is a database of earthquakes that have occurred around the world. Using the library "libcomcat" we can acces the data and work in our study, which is to analyze the peligrosity of the earthquakes in an arbitrary area of interest. 

For our repository we have decided so use the next convention: "the first letter shows the level of the folders being "A" for the first level and "B" for the second level. Then, the numbers that follows the letter are used to organize the folders as we pleased. 

The scripts and folders regarded to the seismic events will have an "eq_" prefix.</em>



# Organization of the repository

---

The scripts related to the seismic-volcanic correlation study are organized in the following way:

-<strong>A00_data</strong>: This folder will be the one in charge to the download and storage of the data. For now, it only serves as a temporary storage. Inside this folder, <em>"B_eq_processed"</em> and <em>"B_eq_raw"</em> are the folders where data related to the earthquakes will be stored.

-<strong>A01_source</strong>: This folder will contain the scripts that will be used to download and process the data. Here we can find the folder <em>"B01_eq_download"</em>, where the script <em>"eq_download.py"</em> is located and the <em>"utils.py"</em> code that it uses. This script will download the data from the USGS Earthquake Catalog and store it in the folder <em>"A00_data"</em>. There is also the directory <em>"B01_4_eq_processing"</em> where we have <em>"preprocess.py"</em> and <em>"process.py"</em>. In the first one, there are functions used to do some calculations once the data is already downloaded and then te filter (of a maximum trigger index) is applied. The second one is in charge of asking the user for if 

<em><strong>Note</strong>:</em> to change the parameters of the search, it has to be done at the end of the script <em>"download.py"</em>. in folder <em>"B01_eq_download"</em>. Even thought this should be an easy implementation, it has not been done yet because the goal was to create GUI in the dashboard and that was not possible yet.

-<strong>A04_web</strong>: in this directory is the Quarto script that displays the dashboard where all data is shown.

# Functions:


## Functions from B01_2_eq_dowloand

---

### Script: eq_download.py

    -search_by_min_magnitude(date_i, date_f, min_magn, center_coords, reg_rad)

This function calls the libcomcat functions search(), get_summary_data_frame() and get_detail_data_frame(), to get the basic information of the events and their magnitudes information. 

Its parameters are: starting date (date_i), ending date (date_f), minimum magnitude of the events (min_mag), coordinates of the volcano we want to study (center_coords in form (latitude, longitude) list) and the radius of the area we want to study in <em>km</em> (reg_rad).

The function returns a dataframe with the parameters that will be used to get the trigger index by merging the two dataframes obtained from the previous functions with the auxiliary function working_df().

    -download_all_by_region(date_i, date_f, center_coords, reg_rad)

This is the main function for the script since it requests to the USGS catalog the data for all earthquakes in the region with a minimum magnitude of 0.001, since we can be also interested in looking for the magnitude of completeness. 

It uses the same parameters as the previous function, but it does not require the minimum magnitude. The return is the same dataframe, but with an wider list of events.

    -download_optimized(date_i, date_f, center_coords, reg_rad)

This other function calls for the function of calculation of minimum magnitud per radius that satisffies the condition of maximum trigger index. This way it is not necessary to download all the data but only the necessary events, allowing to stablish a filter before the download process.
    
    -working_df(df1, df2)

As it is said before, this is an auxiliary function that is used to merge the two dataframes (df1 and df2) obtained from the libcomcat functions. It is necessary because the we can not know which type of magnitude is the function get_summary_data_frame() going to return. Thus, with this function we can merge, by the ID of the events, the two dataframes and get the information we need.

    -coordinates_format(lat, lon)

This function is used to correct the format of the coordinates associated to the volcano we want to study. It is necessary because the libcomcat functions require the coordinates in a specific format. The function takes the latitude and longitude as parameters and returns a list with the coordinates in the format required by the libcomcat functions.

It returns a list with the coordinates in the format required.

    -update_ref(date_i, date_f, coords, region)

This function will be used to update the parameters of the search we want to do. It is not yet recquired but will be when the GUI is implemented.

    -process_ign_file(file_name, file_path)

This was propossed at the end of the project so it is not yet implemented. It is thought to be used to process the data from the IGN catalog. The function takes the file name and the path of the file as parameters and returns a dataframe with the data in the format required by the rest of the functions.

### Script: utils.py

    -saving_data(df, filename, folder="B_eq_raw")

A general function for saving data in the folder <em>"A00_data"</em>. It takes the dataframe, the name of the file and the folder where we want to save it. The default folder is <em>"B_eq_raw"</em> but it can be changed if changing the code, as well as for the file name.

    -date_format(date)

It was used to transform the date format to the one used in my scripts. There are other functions implemented by Python but at the time of coding I did not know them.

    -limit_region_coords(lat_cent, lon_cent, region_rad)

To determine a square area around a given central point and radial region circumscribed, used in the graphic representation of the data.

    -get_lat_lot_from_file(file="wrk_df.csv")

Auxiliary function used on the developement of the scripts that, given a file name , gets the latitude and longitude of the given file.

    -simulate_min_mag_by_radius(radius, max_trigger_index = 100.0, L_method = "Singh")

Auxiliary function used to optimize the download of the data, it simulates the minimum magnitude of the events that we can find in a given area that sattisfies the maximum trigger index condition. This way there can be discarded a lot of the irrelevant events (those with a trigger_index over 100, but even this number is high and can be changed when the function is called).

    -move_file_to_project(file_name, output_file_name="external_data.csv")

Function that can be used in the future to move an external file with the data we want to used. This function has to be changed but it was firstly thought to be used when using another catalog. 

## Functions from B01_4_eq_processing.py

---

### Script: preprocess.py

    -fault_length(magnitude, L_method = "Singh")

Used to calculate the fault_length, the method used by default is the presented in the paper Singh et al. in 2008 used for an square-shaped fault. Other methods can be easily implemented and called with this function, allowing to adjust a more precise calculation of the desired region.

    -distance_calculation(lat1, lon1, lat2, lon2)

This function calls the Harvesine function to calculate the distance between two coordinates, according to the earth spheric curvature (using NASA's medium earth radius shown in their Earth's Fact Sheet) and looking at the difference as if they were in a plain map.

    -trigger_index(L_method="Singh", file_name="wrk_df.csv")

Used to calculate and create a new column in the Dataframe of the file called in the function parameters. It can be specified the fault length method of calculation only if the method exists inside the previously mentioned function.

    -discard_by_max_trigger_index(file="wrk_df.csv", max_trigger_index= 100.0)

To filter the main Dataframe it is used this function. It is thought to look into the "wrk_df.csv" file that contains the temporary Dataframe of all the events given by the catalog and then overwritting or creating, if non-existent, the file "trigger_index_filtered.csv". This new file is a copy of the first one but one with the events that satisffies the condition of maximum trigger index. By using this function the user is able to consult the whole list and the filtered list, in case it is necessary.

    -user_answers(dwl_opt="no", discard_trigger_index="no")

This function is used in other script and is the one that, depending what the user answers in the terminal, will call the different download functions, depending 


### Script: process_eq_data.py

Thought to be run after changing the reference parameters at the end of the "download.py" script, this one is the one in charge of the execution of all functions and the update of the maps, histograms and tables for the display. Once this code is executed it will automatically ask the user if the the data (dataframes in "A00_data") should be updated, the method used to download it and if the file "trigger_index_filtered.csv" should be updated as well to represent it in the map.

    -ask_for_data_update()

No parameters needed, this function is the one that asks the user the mentioned before.

    -generate_table(data, output_folder)
    -generate_map(data, output_folder, is_filtered=False)
    -generate_histogram(data, output_folder)
    -plot_events_histogram(file = "wrk_df.csv")

All these functions are meant to generate .html figures to show in the dashboard display. The recquired inputs are already set in the beginning of the function main(), they only have to be called to generate them.   
    
    -count_events_per_month(data)

Auxiliary function used only to count the number of events per month.

# How is the code intended to be used
---

There are plenty of functions that does not work on their own. I think it is better to explain a little bit how the code is intended to be run. 

To change the parameters of the search, it has to be done at the end of the script <em>"download.py"</em>, in the folder <em>"B01_eq_download"</em>. 

If the condition of maximum trigger index want to be changed, it has to be done in the script <em>"preprocess.py"</em> in the folder <em>"B01_4_eq_processing"</em>, in the function <em>"discard_by_max_trigger_index"</em> as well as in the function <em>user</em>.

Once the parameters are set, the script <em>"process_eq_data.py"</em> has to be run. This script will ask the user, in the terminal, if the data should be updated and if the trigger index should be calculated. It is recquired to answer as shown at the end of each question. 

After this the script will run, updating and generating the figures and tables that will be shown in the dashboard. <strong>This usually takes a while, depending on the amount of data that has to be requested to the USGS service so be patient.</strong> There are some depuration messages that are set to be shown in the terminal, but they are not important. 

After running the script, the <strong>data</strong> will be availible in the folder <em>"A00_data"</em> and the <strong>figures</strong> and <strong>table</strong> will be stored in <em>A04_web/B_images</em>, where, then, can be shown running the <em>dashboard.qmd</em> code. 