# Earthquake Ipsum


Geology is the scientific study of the Earth, including its composition, structure, and history. It examines the rocks, minerals, and processes that shape our planet. One fascinating aspect of geology is the study of earthquakes, which are natural phenomena characterized by the shaking of the Earth's crust. Earthquakes occur due to the movement of tectonic plates, and they can have significant impacts on the environment and human society.


### Usage
Explore data from your organization's model including predictions, outcomes, interventions, and sensitive cohorts.  
Use ```sm.show_info()``` to explore what is available.

In [None]:
%matplotlib inline

import seismometer as sm
sm.run_startup(config_path='.')

In [None]:
sm.show_info(plot_help=True)

## Overview

#### ℹ Info

Earthquakes are typically caused by the release of accumulated stress along faults in the Earth's crust. When the stress exceeds the strength of the rocks, it results in the sudden release of energy, leading to seismic waves that propagate through the Earth. These waves cause the ground to shake, and the severity of shaking depends on factors such as the magnitude of the earthquake and the distance from the epicenter.


**Tips:** 
 - Run the cell below to create an `ipywidget` selector. Selectors are linked across notebook cells, and will dynamically update visualizations and reports throughout the notebook. You can also call `sm.cohort_selector()` in a new notebook cell to create the same selector elsewhere.  
 - Use the selector to stratify certain plots and reports across cohorts.

#### Selection

The study of earthquakes is crucial for understanding the Earth's dynamic nature and mitigating their potential hazards. Seismologists use various instruments, including seismographs, to measure and record seismic waves. By analyzing these recordings, scientists can determine the earthquake's location, magnitude, and focal mechanism. This information helps in assessing the earthquake's impact and developing strategies to minimize damage and protect lives.


In [None]:
sm.cohort_list()

## Feature Monitor

#### ℹ Info

Geologists also investigate the causes and effects of earthquakes. They study fault lines, which are fractures in the Earth's crust where most earthquakes occur. By examining the characteristics of faults, geologists can gain insights into the movement of tectonic plates and the forces acting upon them. This knowledge is crucial for assessing seismic hazards in different regions and developing building codes and infrastructure that can withstand earthquakes.


**Tips**: 
 - This section provides insight into model inputs, demographics, and the set of interventions and outcomes. During early stages this will help validate configuration; afterwards it will assist with detecting feature and population drift. Read through the alerts identified for your data, dig deeper using the feature, demographic, and event summaries, or by comparing across targets or demographics.
 - **Other Warnings:** The variable profiles below will identify any concerning trends in feature distributions. Depending on the model, you may want to do additional configuration to silence these alerts until certain thresholds are met. 
 - Run the `sm.feature_summary()`/`sm.cohort_comparison_report()`/`sm.target_feature_summary()` functions in the cells below to get a report for the corresponding dataset.

#### Reports

##### Feature Alerts
View automatically identified data quality issues for the model inputs in your dataset

In [None]:
sm.feature_alerts()

##### Feature Summary Statistics and Plots
View the summary statistics and distributions for the model inputs in your dataset. 

In [None]:
sm.feature_summary()

##### Summarize Features by Cohort Subgroup
Run `sm.cohort_comparison_report()`, select two different groups to compare, and hit `Generate Report` to generate a comparative feature report.

In [None]:
sm.cohort_comparison_report()

##### Summarize Features by Target
Run `sm.target_feature_summary()` to get a link to a breakdown of your features stratified by the different target values.

Earthquakes can have a wide range of consequences. The immediate impact includes ground shaking, which can cause buildings, bridges, and other structures to collapse. It can also trigger landslides and tsunamis in coastal areas. Furthermore, earthquakes can result in long-term effects such as changes in land elevation, groundwater levels, and the formation of new geological features. By studying past earthquakes, geologists can unravel the Earth's geological history.


In [None]:
sm.target_feature_summary()

## Model Performance

### Overall

####  ℹ Info

**Model Performance Plots**
1. (Top-left) The ROC curve, or receiver operating characteristic curve demonstrates a model's performance across all thresholds. The curve shows the sensitivity and specificity across all possible thresholds for the model. 
    - This plot can help you assess both in aggregate and at specific thresholds how often the model correctly identifies positive cases and negative cases. The AUROC or C-stat is the area under the ROC curve is a single measure of how well the model performs across thresholds.
2. (Top-middle) This curve plots the sensitivity and flag rate across all possible thresholds for the model. 
    - This plot can help you determine how frequently your model would trigger workflow interventions at different thresholds, and how many of those interventions would be taken for true positive cases. The highlighted area above the curve indicates how many true positives would be missed at this threshold.
3. (Top-right) The calibration curve is a measure of how reliable a model is in its predictions at a given threshold. 
    - This plots the observed rate (what proportion of cases at that threshold are true positives) against the model's predicted probability.
    - Points above the line `y=x` line indicate that a model is *overconfident* in its predictions (i.e. identifying more positive cases than exist), and points below the `y=x` line indicate that a model is *underconfident* in its predictions (i.e. identifying fewer positive cases than exist).
4. (Bottom-left) The PR curve, or precision-recall curve, shows the tradeoff between precision and recall for different threshold. 
    - The curve shows precision and recall across all possible thresholds for the model. This plot can help you assess the tradeoffs between identifying more positive cases, and correctly identifying positive cases. 
5. (Bottom-middle) This plot shows the sensitivity, specificity, and PPV across all possible thresholds for a model. 
    - This plot is useful for identifying thresholds where your model has high enough specificity, sensitivity, and PPV for your intended workflows.
6. (Bottom-right) This plot shows the predicted probabilities for entities in the dataset stratified by whether or not they met the target criteria. 
    - This plot can help you identify thresholds where your model will correctly identify enough of the true positives without identifying too many of the true negatives.

**Tips:**
 - Thresholds configured for the model are highlighted on the graphs.
 - Use `sm.model_evaluation()` to get model evaluation plots for your model at the observation level.
 - Use `sm.model_evaluation(per_context_id=True)` to get model evaluation plots for your model at the encounter level.

Understanding the patterns and distribution of earthquakes is essential for assessing seismic hazards. Some regions, such as the Pacific Ring of Fire, experience frequent seismic activity due to the convergence of tectonic plates. Other areas may have lower seismicity but can still be susceptible to significant earthquakes. By analyzing historical earthquake data and conducting geological surveys, scientists can identify high-risk zones and implement measures to enhance preparedness.


#### Visuals

In [None]:
sm.model_evaluation()

In [None]:
sm.model_evaluation(per_context_id=True)

### Fairness Overview

#### ℹ Info

It is important to note that decisions on which fairness metrics are applicable to a model should be made with deep knowledge of how the model is used and then interpretted carefully.  Even when a metric is flagged as failing, there may be context that explains and even predicts the differnce, so genarally warrants some attention and not necessarily exposes a problem.

It is mathematically impossible to ensure parity across many definitions simultaneously, so focus on a predetermined set and an awareness of the others is an observed strategy.  
Like many concepts, a single parity concept can have several different names; notable parity of true positive rate is equal opportunity, parity of false positive rate is predictive equality, and parity of predictive prevalence is demographic parity.

An aequitas audit gives an overview of parity across all defined groups for each cohort attribute.  By default, the majority group is the baseline and a statistic for all observations in the other groups is compared. A fairness threshold such as 125% is then used to classify the ratio of each group to the reference.  If any group performs differently, above (125% in our example) or below (80%) then it is considered a failure for that cohort+metric.  

The visualization is a table showing the overall pass/fail, an ordered list of circes representing the groups, and a bar representing the percentage of the population data in reference.  Note that comparison across columns is not always exact due to potential differences in the included observations from missing information. Hovering over a bar or dot will give details on the group and metric.

In the past, seismographs consisted of large pendulums and mechanical drums that recorded ground motion on a rotating cylinder covered in soot or smoked paper. However, with advancements in technology, modern seismographs now employ highly sensitive electronic sensors and digital recording systems.


In [None]:
sm.fairness_audit(metric_list=['tpr', 'fpr', 'pprev', 'precision'], fairness_threshold=1.25)

### Cohort Analysis 

In [None]:
sm.show_cohort_summaries(by_target=False, by_score=False)

#### ℹ Info

**Model Performance Plots**:
This section has general distribution plots for cohorts, crosstabs, and model performance plots stratified by cohort.
1. (Top-left) The ROC curve, or receiver operating characteristic curve demonstrates a model's performance across all thresholds. The curve shows the sensitivity and specificity across all possible thresholds for the model. 
    - This plot can help you assess both in aggregate and at specific thresholds how often the model correctly identifies positive cases and negative cases. The AUROC or C-stat is the area under the ROC curve is a single measure of how well the model performs across thresholds.
2. (Top-middle) This curve plots the sensitivity and flag rate across all possible thresholds for the model. 
    - This plot can help you determine how frequently your model would trigger workflow interventions at different thresholds, and how many of those interventions would be taken for true positive cases. The highlighted area above the curve indicates how many true positives would be missed at this threshold.
3. (Top-right) The calibration curve is a measure of how reliable a model is in its predictions at a given threshold. 
    - This plots the observed rate (what proportion of cases at that threshold are true positives) against the model's predicted probability.
    - Points above the line `y=x` line indicate that a model is *overconfident* in its predictions (i.e. identifying more positive cases than exist), and points below the `y=x` line indicate that a model is *underconfident* in its predictions (i.e. identifying fewer positive cases than exist).
4. (Bottom-left) The PR curve, or precision-recall curve, shows the tradeoff between precision and recall for different threshold. 
    - The curve shows precision and recall across all possible thresholds for the model. This plot can help you assess the tradeoffs between identifying more positive cases, and correctly identifying positive cases. 
5. (Bottom-middle) This plot shows the sensitivity, specificity, and PPV across all possible thresholds for a model. 
    - This plot is useful for identifying thresholds where your model has high enough specificity, sensitivity, and PPV for your intended workflows.
6. (Bottom-right) This plot shows the predicted probabilities for entities in the dataset stratified by whether or not they met the target criteria. 
    - This plot can help you identify thresholds where your model will correctly identify enough of the true positives without identifying too many of the true negatives.

**Tips:**
 - Thresholds configured for the model are highlighted on the graphs.
 - Use `sm.cohort_evaluation()` to get model evaluation plots for your model at the observation level.
 - Use `sm.cohort_evaluation(per_context_id=True)` to get model evaluation plots for your model at the encounter level.

Seismic monitoring networks play a vital role in detecting and tracking earthquakes. These networks consist of numerous seismometers strategically placed around the globe to record ground motion. Real-time data from these instruments enable quick identification of earthquake occurrence, determination of its magnitude, and estimation of its effects. This information is crucial for issuing timely warnings and implementing emergency response plans.


#### Visuals

In [None]:
sm.cohort_selector()

In [None]:
sm.cohort_evaluation()

In [None]:
sm.cohort_evaluation(per_context_id=True)

## Outcomes

### Trend comparison

Earthquakes are not only important from a scientific perspective but also impact human lives and communities. Governments, organizations, and individuals take measures to increase earthquake preparedness and resilience. This includes educating the public about earthquake safety, conducting drills, retrofitting buildings to withstand seismic forces, and implementing early warning systems that provide valuable seconds to minutes of alert before the shaking arrives.


In [None]:
sm.plot_trend_intervention_outcome()

### Lead-time Analysis 

#### ℹ Info

Lead-time analysis is focused on revealing the amount of time that a high prediction gives before an event of interest.  These analyses implicitly restrict data to the positive cohort, as that is expected to be the place time the event occurs.
The visualization uses standard box-and-whisker plots where each quartile of the subpopulation has a vertical line, the inner box representing the inner quartiles with mean.   When the cohorts overlap significantly, this indicates the model is providing equal opportunity for action to be taken based on the scores across the cohort groups.

The study of geology and earthquakes is an ongoing endeavor. Scientists continue to explore the complexities of Earth's interior, the mechanisms driving plate tectonics, and the factors that influence earthquake occurrence. Through interdisciplinary research and technological advancements, our understanding of earthquakes continues to evolve, leading to improved hazard assessment, mitigation strategies, and ultimately, the protection of lives and infrastructure.


#### Visuals

In [None]:
sm.plot_leadtime_enc()