## 11_scraping_volcanic_eruption_data.ipynb
<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"><b>This script scrapes the volcanoe and volcanic eruption data from the Smithonian Institution GVP databases.</b> Main parts of the routines have been developed in previous courses at the University of London by the same author (Mohr, 2021, 2023, 2024a) and have been developed further to fulfil the needs of the scraping procedure for this MSc thesis. However, the code has been modified to fulfil the latest requirements and package inter-dependencies. Some comments will be added in this Jupyter Notebook and the code has several inline comments. For the project/research itself, see the appropriate document.
</p>

#### Reference list (for this script)
*Mohr, S. (2024) Trends in Worldwide Subaerial Volcanic Eruptions from 1920 - 2019. DSM050, Data Visualisation, examined coursework cw2. University of London.*

#### History
<pre>
241018 Generation from previous courseworks at the UoL, re-write query_eruptions and query_holocene_volcanoes to use 
       get_data_from_web_api instead of get_data_from_smithonian_api, move get_data_from_web_api to
       shared_procedures.py, use procedure save_dataset, reformatting names and variables, add logging,
       sleep for printing before logging, check and re-generate docstrings
241203 Set parameters and scrape data.
250104 Check docstrings
</pre>

#### Todo
<pre>./.</pre>

## Preparing the environment
### System information

In [1]:
# which python installation and version are we using here?
print('\n******* Python Info ***********')
!which python
!python --version

# show some CPU and RAM info
print('\n******* CPU Info ***********')
!lscpu
print('\n******* RAM Info (in GB) ***********')
!free -g


******* Python Info ***********
/bin/python
Python 3.8.10

******* CPU Info ***********
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             64
On-line CPU(s) list:                0-63
Thread(s) per core:                 2
Core(s) per socket:                 8
Socket(s):                          4
NUMA node(s):                       4
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
Stepping:                           7
CPU MHz:                            1200.768
CPU max MHz:                        4000.0000
CPU min MHz:                        1200.0000
BogoMIPS:                           6600.00
Virtualization:                     VT-x
L1

In [2]:
# show installed packages and versions
!pip freeze

absl-py==2.1.0
affine==2.4.0
aggdraw==1.3.16
array-record==0.4.0
asttokens==2.4.1
astunparse==1.6.3
atomicwrites==1.1.5
attrs==19.3.0
Automat==0.8.0
backcall==0.2.0
beautifulsoup4==4.8.2
blinker==1.4
cachetools==5.5.0
certifi==2019.11.28
chardet==3.0.4
click==8.1.7
click-plugins==1.1.1
cligj==0.7.2
cloud-init==24.3.1
colorama==0.4.3
comm==0.2.2
command-not-found==0.3
configobj==5.0.6
confluent-kafka==2.5.3
constantly==15.1.0
contextily==1.5.2
contourpy==1.1.1
cryptography==2.8
cupshelpers==1.0
cycler==0.10.0
dbus-python==1.2.16
debugpy==1.8.7
decorator==4.4.2
defer==1.0.6
distro==1.4.0
distro-info==0.23+ubuntu1.1
dm-tree==0.1.8
entrypoints==0.3
et-xmlfile==1.0.1
etils==1.3.0
executing==2.0.1
fail2ban==0.11.1
fastjsonschema==2.20.0
filelock==3.13.1
fiona==1.9.6
flatbuffers==24.3.25
fonttools==4.53.1
fsspec==2023.12.2
ftfy==6.2.0
gast==0.4.0
geographiclib==2.0
geopandas==0.13.2
geopy==2.4.1
google-auth==2.36.0
google-auth-oauthlib==1.

### Setting PATH correctly

In [3]:
# there is somewhere a PATH-error on LENA for a while
# adding my packages path to the PATH environment

import sys
sys.path.append("/home/smohr001/.local/lib/python3.8/site-packages")
sys.path

['/home/smohr001/thesis',
 '/usr/lib/python38.zip',
 '/usr/lib/python3.8',
 '/usr/lib/python3.8/lib-dynload',
 '',
 '/opt/jupyterhub/lib/python3.8/site-packages',
 '/opt/jupyterhub/lib/python3.8/site-packages/IPython/extensions',
 '/home/smohr001/.ipython',
 '/home/smohr001/.local/lib/python3.8/site-packages']

### Loading libraries

In [4]:
# importing standard libraries
import sys
import os
import warnings
import datetime
import time
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
import logging

# importing shared procedures for this procect (needs to be a simple .py file)
%run shared_procedures.py

# importing additional libraries
import requests
from requests.exceptions import HTTPError
import xml.etree.ElementTree as ET

# get info about installed and used versions of some important (deep learning) libraries
print("Some important installed libraries:\n")
print(f"Pandas version: {pd.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"Seaborn version: {sns.__version__}")

Some important installed libraries:

Pandas version: 1.4.1
Numpy version: 1.22.2
Seaborn version: 0.13.2


#### Set up parameters and identification of this script

In [5]:
# show all matplotlib graphs inline
%matplotlib inline

# setting format to JPG for easy copy & paste for figures
# for high quality outputs choose 'svg'
%config InlineBackend.figure_format = 'jpg'

# adjust display settings to show 20 rows as a standard
pd.set_option('display.max_rows', 20)

# ignore warnings (low priority)
warnings.filterwarnings('ignore')

# set script (ipynb notebook) name (e.g. for logging)
script_name = "11_scrape_volcanic_eruption_data.ipynb"

# start parameterized logging
setup_logging(logfile_dir = "log", 
              logfile_name = "10_data_scraping.log", 
              log_level = logging.INFO, 
              script_name = script_name
             )

# set data directory
data_dir = "data"
logging.info(f"{script_name}: Set data directory to './{data_dir}'.")

2025-01-10 13:34:11,013 - INFO - Starting script '11_scrape_volcanic_eruption_data.ipynb'.
2025-01-10 13:34:11,014 - INFO - Set loglevel to INFO.
2025-01-10 13:34:11,015 - INFO - 11_scrape_volcanic_eruption_data.ipynb: Set data directory to './data'.


#### Checking connectivity of Smithonian volcano and eruption database APIs
Before querying the databases to get the two datasets, a manual test is done by providing a simple query. This also shows the resulting data and dataset structure.

In [6]:
# GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes
# https://volcano.si.edu/database/webservices.cfm
query_parameters = {
    "maxFeatures": "1"
}

query_status, query_answer = get_data_from_web_api(
    url = "https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes",
    query_parameters = query_parameters,
    verbosity = 0
)

if(query_status):
    print("geoJSON format\n")
    print(query_answer.text)
else:
    print("\nSome error occured! Nothing to print!")

geoJSON format

<?xml version="1.0" encoding="UTF-8"?><wfs:FeatureCollection xmlns="http://www.opengis.net/wfs" xmlns:wfs="http://www.opengis.net/wfs" xmlns:gml="http://www.opengis.net/gml" xmlns:GVP-VOTW="volcano.si.edu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wfs https://webservices.volcano.si.edu/geoserver/schemas/wfs/1.0.0/WFS-basic.xsd volcano.si.edu https://webservices.volcano.si.edu/geoserver/GVP-VOTW/wfs?service=WFS&amp;version=1.0.0&amp;request=DescribeFeatureType&amp;typeName=GVP-VOTW%3ASmithsonian_VOTW_Holocene_Volcanoes"><gml:boundedBy><gml:null>unknown</gml:null></gml:boundedBy><gml:featureMember><GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes fid="Smithsonian_VOTW_Holocene_Volcanoes.fid--71e2f013_1944fc39014_34cf"><GVP-VOTW:Volcano_Number>352030</GVP-VOTW:Volcano_Number><GVP-VOTW:Volcano_Name>Antisana</GVP-VOTW:Volcano_Name><GVP-VOTW:Volcanic_Landform>Composite</GVP-VOTW:Volcanic_Landform><GVP-VOTW:Primary_Volcano_Typ

In [7]:
# VP-VOTW:Smithsonian_VOTW_Holocene_Eruptions
# https://volcano.si.edu/database/webservices.cfm
query_parameters = {
    "maxFeatures": "1"
}

query_status, query_answer = get_data_from_web_api(
    url = "https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Eruptions",
    query_parameters = query_parameters,
    verbosity = 0
)

if(query_status):
    print("geoJSON format\n")
    print(query_answer.text)
else:
    print("\nSome error occured! Nothing to print!")

geoJSON format

<?xml version="1.0" encoding="UTF-8"?><wfs:FeatureCollection xmlns="http://www.opengis.net/wfs" xmlns:wfs="http://www.opengis.net/wfs" xmlns:gml="http://www.opengis.net/gml" xmlns:GVP-VOTW="volcano.si.edu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wfs https://webservices.volcano.si.edu/geoserver/schemas/wfs/1.0.0/WFS-basic.xsd volcano.si.edu https://webservices.volcano.si.edu/geoserver/GVP-VOTW/wfs?service=WFS&amp;version=1.0.0&amp;request=DescribeFeatureType&amp;typeName=GVP-VOTW%3ASmithsonian_VOTW_Holocene_Eruptions"><gml:boundedBy><gml:null>unknown</gml:null></gml:boundedBy><gml:featureMember><GVP-VOTW:Smithsonian_VOTW_Holocene_Eruptions fid="Smithsonian_VOTW_Holocene_Eruptions.fid--71e2f013_1944fc39014_34d0"><GVP-VOTW:Volcano_Number>352030</GVP-VOTW:Volcano_Number><GVP-VOTW:Volcano_Name>Antisana</GVP-VOTW:Volcano_Name><GVP-VOTW:Eruption_Number>11505</GVP-VOTW:Eruption_Number><GVP-VOTW:Activity_Type>Uncertain Eru

#### Main query methods
The method *query_holocene_volcanoes* uses the basic method *get_data_from_smithsonian_api* to query the Holocene Volcano database. It provides the resulting dataset with 11 different features.

In [8]:
def query_holocene_volcanoes(api_query_parameters):
    """
    Queries the Smithsonian Volcano Database for Holocene volcano data and returns a pandas DataFrame containing the results.

    Parameters:
        api_query_parameters : dict
            A dictionary containing query parameters for the Smithsonian Volcano Database API. Parameters can include filters
            such as geographic region, tectonic setting, or other criteria accepted by the API endpoint.

    Returns:
        pandas.DataFrame or None
            A DataFrame containing the queried Holocene volcano data. Columns typically include:
            'Volcano_Number', 'Volcano_Name', 'Primary_Volcano_Type', 'Country', 'Region', 'Latitude', 'Longitude', 
            'Elevation', 'Tectonic_Setting', 'Evidence_Category', and 'Major_Rock_Type'.
            If the query fails or no data is retrieved, the function returns None.

    Raises:
        ValueError
            If the 'api_query_parameters' are not set or invalid.
        RuntimeError
            If there is an error during the API query or data parsing process.

    Logs:
        Logs the start and end of the querying process, the number of volcanoes retrieved and parsed,
        and errors encountered during query and parsing.

    Notes:
        The query makes use of the Smithsonian Institution's Volcano Web Service (VOTW) API and retrieves data
           in XML format, which is parsed into a pandas DataFrame.
        Elements are chosen from a list of possible elements and set with the parameter 'elements'.
        Ensure 'api_query_parameters' are provided in a valid format supported by the API.
        This docstring was generated with the help of AI and proofread by the author.
    """
    
    # are some parameters set?
    if api_query_parameters:
        
        print("======================================================================================================")
        print("Querying Smithsonian Volcano Database --> Holocene Volcanoes")
        logging.info(f"query_holocene_volcanoes: START main query method for holocene volcanoes.")

        # initialize timing information for this routine
        start = time.time()

        # initialize empty dataframe
        api_volcanoes = pd.DataFrame()

        # show the parameters
        print("\nQuery parameters: " + str(api_query_parameters))

        # print info
        print("Getting Smithsonian Volcano data  ...")

        # query the API (show errors, verbosity = 1)
        api_query_status_ok, api_response = \
            get_data_from_web_api(url = "https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes",
                                  query_parameters = api_query_parameters,
                                  verbosity = 0
                                 )

        # parse the queried data
        if api_query_status_ok:
            # query should be okay, go ahead
            print("Got data, parsing ...")

            # define the namespace(s)
            ns = {
                'wfs': "http://www.opengis.net/wfs",
                'gml': "http://www.opengis.net/gml",
                'GVP-VOTW': "volcano.si.edu"
            }

            # parse the XML
            root = ET.fromstring(api_response.text)

            # count the number of features
            feature_count = len(root.findall('gml:featureMember', ns))

            # dhow number of found volcanoes
            print("Number of volcanoes:", feature_count, "\n")

            # create a list to store the volcanoes
            api_volcanoes = []

            # define all possible elements based on the XSD schema (see below)
            elements = [
                "Volcano_Number", "Volcano_Name", "Primary_Volcano_Type", 
                "Country", "Region", "Latitude", "Longitude", 
                "Elevation", "Tectonic_Setting", "Evidence_Category", 
                "Major_Rock_Type"
            ]

            # iterate through each featureMember
            for member in root.findall('gml:featureMember', ns):
                volcano = member.find('GVP-VOTW:Smithsonian_VOTW_Holocene_Volcanoes', ns)
                record = {}
                for elem in elements:
                    element = volcano.find(f'GVP-VOTW:{elem}', ns)
                    record[elem] = element.text if element is not None else None
                
                # append this volcano to the list of volcanoes
                api_volcanoes.append(record)

            # create a df from the list of volcanoes
            api_volcanoes_df = pd.DataFrame(api_volcanoes)

            # print some final information (number of parsed events and runtime of routine)
            print("Total number of parsed volcanoes: " + str(len(api_volcanoes)))
            print("Runtime to query and parse the data: " + str(round(time.time() - start, 1)) + " s")
            print("======================================================================================================")       
            time.sleep(0.5)
            logging.info(f"query_holocene_volcanoes: END main query method for volconoes with {len(api_volcanoes)} volcanoes in {round(time.time() - start, 1)} s.")

            # reset index
            api_volcanoes_df.reset_index(drop=True, inplace=True)

            # return the dataframe with found volcanoes
            return api_volcanoes_df
                
        else:
            # bad query result
            logging.error(f"query_holocene_volcanoes: Bad query result status!")
            
            # return nothing (none)
            return None
        
    # no input parameters available
    else:
        # unsuccessful query 
        logging.error(f"query_holocene_volcanoes: No input parameters evailable!")
        
        # return nothing (none)
        return None


The method *query_eruptions* uses the basic method *get_data_from_smithsonian_api* to query the Smithsonian Eruptions database. It provides the resulting dataset with 19 queried and 21 expanded features.

In [9]:
def query_eruptions(api_query_parameters):
    """
    Queries the Smithsonian Volcano Database for Holocene eruptions data and returns a pandas DataFrame containing the results.

    Parameters:
        api_query_parameters : dict
            A dictionary containing query parameters for the Smithsonian Volcano Database API. Parameters can include filters
            such as volcano number, activity type, and date ranges accepted by the API endpoint.

    Returns:
        pandas.DataFrame or None
            A DataFrame containing the queried Holocene eruptions data. Columns typically include:
            'Volcano_Number', 'Volcano_Name', 'Eruption_Number', 'Activity_Type', 'ExplosivityIndexMax', 'ExplosivityIndexModifier',
            'ActivityArea', 'ActivityUnit', 'StartEvidenceMethod', 'StartDateYear', 'StartDateMonth', 'StartDateDay',
            'EndDateYear', 'EndDateMonth', 'EndDateDay', 'Longitude', and 'Latitude'.
            If the query fails or no data is retrieved, the function returns None.

    Raises:
        ValueError
            If 'api_query_parameters' are not set or invalid.
        RuntimeError
            If there is an error during the API query or data parsing process.

    Logs:
    -----
        Logs the start and end of the querying process, the number of eruptions retrieved and parsed,
        and errors encountered during query and parsing.

    Notes:
    ------
        The query utilizes the Smithsonian Institution's Volcano Web Service (VOTW) API and retrieves data in XML format,
           which is parsed into a pandas DataFrame.
        The 'api_query_parameters' must be valid as per the API specifications.
        Elements are chosen from a list of possible elements and set with the parameter 'elements'.
        The geolocation coordinates are extracted from the XML data and split into 'Longitude' and 'Latitude'.
        This docstring was generated with the help of AI and proofread by the author.

    """
    
    # are some parameters set?
    if (api_query_parameters):
        
        print("======================================================================================================")
        print("Querying Smithonian Volcano Database --> Eruptions")
        logging.info(f"query_eruptions: START main query method for eruptions.")

        # initialize timing information for this routine
        start = time.time()

        # initialize empty dataframe
        api_eruptions = pd.DataFrame()

        # show the parameters
        print("\nQuery parameters: " + str(api_query_parameters))

        # print info
        print("Getting Smithonian Eruption data  ...")

        # query the API (show errors, verbosity = 1)
        api_query_status_ok, api_response = \
            get_data_from_web_api(url = "https://webservices.volcano.si.edu/geoserver/GVP-VOTW/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=GVP-VOTW:Smithsonian_VOTW_Holocene_Eruptions",
                                  query_parameters = api_query_parameters,
                                  verbosity = 0
                                 )

        # parse the queried data
        if(api_query_status_ok):
            # query should be okay, go ahead
            print("Got data, parsing ...")

            # define the namespace(s)
            ns = {
                'wfs': "http://www.opengis.net/wfs",
                'gml': "http://www.opengis.net/gml",
                'GVP-VOTW': "volcano.si.edu"
            }

            # parse the XML
            root = ET.fromstring(api_response.text)

            # count the number of features
            feature_count = len(root.findall('gml:featureMember', ns))

            # show number of found eruptions
            print("Number of eruptions:", feature_count, "\n")

            # create a list to store the eruptions
            api_eruptions = []

            # define all possible elements based on the XSD schema (see below)
            elements = [
                "Volcano_Number", "Volcano_Name", "Eruption_Number", "Activity_Type",
                "ExplosivityIndexMax", "ExplosivityIndexModifier", "ActivityArea",
                "ActivityUnit", "StartEvidenceMethod",
                "StartDateYearModifier", "StartDateYear", "StartDateMonth", "StartDateDayModifier", "StartDateDay",
                "EndDateYearModifier", "EndDateYear", "EndDateMonth", "EndDateDayModifier", "EndDateDay"
            ]

            # iterate through each featureMember
            for member in root.findall('gml:featureMember', ns):
                volcano = member.find('GVP-VOTW:Smithsonian_VOTW_Holocene_Eruptions', ns)
                record = {}
                for elem in elements:
                    element = volcano.find(f'GVP-VOTW:{elem}', ns)
                    record[elem] = element.text if element is not None else None

                # Extract and split GeoLocation
                location = volcano.find('.//gml:coordinates', ns)
                if location is not None:
                    coordinates = location.text.split(',')
                    record['Longitude'] = coordinates[0].strip()
                    record['Latitude'] = coordinates[1].strip()
                else:
                    record['Longitude'] = None
                    record['Latitude'] = None
                
                # appned this eruption to the list of eruptions
                api_eruptions.append(record)
    
            # create a df from the list of eruptions
            api_eruptions_df = pd.DataFrame(api_eruptions)

            # print some final information (numer of parsed events and runtime of routine)
            print("Total number of parsed eruptions: " + str(len(api_eruptions)))
            print("Runtime to query and parse the data: " + str(round(time.time() - start, 1)) + " s")
            print("======================================================================================================")
            time.sleep(0.5)
            logging.info(f"query_eruptions: END main query method for eruptions with {len(api_eruptions)} eruptions in {round(time.time() - start, 1)} s.")

            # reset index
            api_eruptions_df.reset_index(drop=True, inplace=True)

            # return the dataframe with found eruptions
            return api_eruptions_df
                
        else:
            # bad query result
            logging.error(f"query_eruptions: Bad query result status!")
            
            # return nothing (none)
            return(None)
        
    # no input parameters evailable
    else:
        # unsuccessful query 
        logging.error(f"query_eruptions: No input parameters evailable!")
        
        # return nothing (none)
        return(None)

#### Querying volcano and eruptions data
The two query methods are now used to query all available data from both databases. The data is stored in two dataframes: (1) *volcanoes* and (2) *eruptions*.

In [10]:
# get all data for this area, add area information (setting maxFeatures to a very high value)
query_parameters = {
    "maxFeatures": "9999"
}
volcanoes = query_holocene_volcanoes(query_parameters)

# show eruption
display(volcanoes)

2025-01-10 13:34:11,959 - INFO - query_holocene_volcanoes: START main query method for holocene volcanoes.


Querying Smithsonian Volcano Database --> Holocene Volcanoes

Query parameters: {'maxFeatures': '9999'}
Getting Smithsonian Volcano data  ...
Got data, parsing ...
Number of volcanoes: 1281 

Total number of parsed volcanoes: 1281
Runtime to query and parse the data: 1.6 s


2025-01-10 13:34:14,072 - INFO - query_holocene_volcanoes: END main query method for volconoes with 1281 volcanoes in 2.1 s.


Unnamed: 0,Volcano_Number,Volcano_Name,Primary_Volcano_Type,Country,Region,Latitude,Longitude,Elevation,Tectonic_Setting,Evidence_Category,Major_Rock_Type
0,210010,West Eifel Volcanic Field,Volcanic field,Germany,European Volcanic Regions,50.1700,6.8500,600,Rift zone / Continental crust (> 25 km),Eruption Dated,Foidite
1,210020,Chaine des Puys,Lava dome(s),France,European Volcanic Regions,45.7860,2.9810,1464,Rift zone / Continental crust (> 25 km),Eruption Dated,Basalt / Picro-Basalt
2,210030,Olot Volcanic Field,Volcanic field,Spain,European Volcanic Regions,42.1700,2.5300,893,Intraplate / Continental crust (> 25 km),Evidence Credible,Trachybasalt / Tephrite Basanite
3,210040,Calatrava Volcanic Field,Volcanic field,Spain,European Volcanic Regions,38.8700,-4.0200,1117,Intraplate / Continental crust (> 25 km),Eruption Dated,Basalt / Picro-Basalt
4,211004,Colli Albani,Caldera,Italy,European Volcanic Regions,41.7569,12.7251,949,Subduction zone / Continental crust (> 25 km),Evidence Uncertain,Foidite
...,...,...,...,...,...,...,...,...,...,...,...
1276,221294,Northern Lake Abaya Volcanic Field,Volcanic field,Ethiopia,Eastern Africa Volcanic Regions,6.7600,37.9700,1594,Intraplate / Continental crust (> 25 km),Evidence Credible,
1277,221330,Mega Volcanic Field,Volcanic field,Ethiopia,Eastern Africa Volcanic Regions,3.9710,38.2130,1500,Rift zone / Intermediate crust (15-25 km),Evidence Credible,
1278,312055,Stepovak Bay Group,Volcanic field,United States,North America Volcanic Regions,55.9170,-160.0170,1633,Subduction zone / Continental crust (> 25 km),Evidence Credible,
1279,244021,Malumalu,Stratovolcano,United States,Southern Pacific Volcanic Regions,-14.6010,-169.7870,-145,Intraplate / Oceanic crust (< 15 km),Evidence Credible,


In [11]:
# get all data for this area, add area information (setting maxFeatures to a very high value)
query_parameters = {
    "maxFeatures": "999999"
}
eruptions = query_eruptions(query_parameters)

# show eruption
display(eruptions)

2025-01-10 13:34:14,105 - INFO - query_eruptions: START main query method for eruptions.


Querying Smithonian Volcano Database --> Eruptions

Query parameters: {'maxFeatures': '999999'}
Getting Smithonian Eruption data  ...
Got data, parsing ...
Number of eruptions: 11130 

Total number of parsed eruptions: 11130
Runtime to query and parse the data: 2.9 s


2025-01-10 13:34:17,483 - INFO - query_eruptions: END main query method for eruptions with 11130 eruptions in 3.4 s.


Unnamed: 0,Volcano_Number,Volcano_Name,Eruption_Number,Activity_Type,ExplosivityIndexMax,ExplosivityIndexModifier,ActivityArea,ActivityUnit,StartEvidenceMethod,StartDateYearModifier,...,StartDateMonth,StartDateDayModifier,StartDateDay,EndDateYearModifier,EndDateYear,EndDateMonth,EndDateDayModifier,EndDateDay,Longitude,Latitude
0,213021,Suphan Dagi,13907,Uncertain Eruption,,,,,Correlation: Tephrochronology,?,...,0,,0,,,,,,42.833,38.931
1,213020,Nemrut Dagi,10039,Confirmed Eruption,,,,,Sidereal: Varve Count,?,...,0,,0,,,,,,42.229,38.654
2,213020,Nemrut Dagi,10044,Confirmed Eruption,,,,,Sidereal: Varve Count,,...,0,,0,,,,,,42.229,38.654
3,213020,Nemrut Dagi,10056,Confirmed Eruption,,,,,Sidereal: Varve Count,?,...,0,,0,,,,,,42.229,38.654
4,213020,Nemrut Dagi,13312,Confirmed Eruption,,,,,Sidereal: Varve Count,,...,0,,0,,,,,,42.229,38.654
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11125,600000,Unknown Source,13237,Confirmed Eruption,,,"(GISP2,109 ppb sulfate)",,Sidereal: Ice Core,,...,0,,0,,,,,,,
11126,600000,Unknown Source,13238,Confirmed Eruption,,,"(GISP2, 94 ppb sulfate)",,Sidereal: Ice Core,,...,0,,0,,,,,,,
11127,600000,Unknown Source,13239,Confirmed Eruption,,,"(GISP2, 109 ppb sulfate)",,Sidereal: Ice Core,,...,0,,0,,,,,,,
11128,600000,Unknown Source,13240,Confirmed Eruption,,,"(GISP2, 97 ppb sulfate)",,Sidereal: Ice Core,,...,0,,0,,,,,,,


#### Save scraped datasets

In [12]:
# save earthquake dataset
save_dataset(data_file = "volcanoes_scraped.csv", 
             data_dir = data_dir, 
             data_set = volcanoes
            )  

2025-01-10 13:34:17,558 - INFO - save_dataset: Data saved successfully to 'data/volcanoes_scraped_250110-133417.csv'.


In [13]:
# save earthquake dataset
save_dataset(data_file = "eruptions_scraped.csv", 
             data_dir = data_dir, 
             data_set = eruptions
            )  

2025-01-10 13:34:17,619 - INFO - save_dataset: Data saved successfully to 'data/eruptions_scraped_250110-133417.csv'.


#### End of script

In [14]:
# log the end of this script
logging.info(f"End of script '{script_name}'.")

2025-01-10 13:34:17,623 - INFO - End of script '11_scrape_volcanic_eruption_data.ipynb'.
