### **FIGURES AND STATISTICS**
by J. Daniel Velez

The code is intended to plot some figures (stacked bars, CDF, scatterplots) and to calculate Spearmans and Pearson correlation coefficients

In [None]:
from f_filter_process import get_file, call_file, export_dataframe
import matplotlib.pyplot as plt
from f_stats import*
import pandas as pd

In [None]:
# Function to import files
rivers_budget = call_file() 

In [None]:
rivers_budget

### 1. This section shows the reaches and nodes remaining and deleted after the filtering process. CSV file must be used

In [None]:
#csv files
river_budgetlis = rivers_budget.reset_index()
selected_data = river_budgetlis[['RIVER', 'REMAINING', 'DELETED']]
selected_data

In [None]:
# Subseting rivers at reach level
subset_reaches = selected_data[selected_data['RIVER'].str.contains('Reaches')]
print(subset_reaches)

In [None]:
# Stacked bar plot to get the accountability of the reaches and nodes left after the filtered process
selected_data.plot(x='RIVER', y=['REMAINING', 'DELETED'], kind='bar', stacked=True,  figsize=(10, 5))
plt.title('Rivers nodes final balance')
plt.xlabel('River nodes')
plt.xticks(rotation=45, ha='right')
plt.ylabel('Number of nodes')
plt.show()

### 2. Correlation coefficients. Json files must be used

In [None]:
# Function to calculate Spearman coefficient, p-values, and number the pairs used to calculate Spearman. Load the filtered dictionary
N_Spearman, rho_df= S_correlation(rivers_budget)

In [None]:
rho_df = rho_df.rename(columns = {'index':'node_id'})
rho_df

In [None]:
# Histogram of pairs used to calculate spearmans
plt.hist(rho_df['num_pairs'], bins=20, color='skyblue',edgecolor='black')
x_ticks = np.arange(int(min(rho_df['num_pairs'])), int(max(rho_df['num_pairs'])) + 1, 1)  # Generate integer ticks
plt.xticks(x_ticks)
plt.xlabel('Number of pairs (W-wse)')
plt.ylabel('Frequency')
plt.title('Histogram of the Valid pairs')

In [None]:
export_dataframe(rho_df,is_geodataframe=False)

Before plotting CDF, it is worth to know the best trade-off between the number of width-wse pairs per observation and the number of Spearman correlations above 0.4 in each river

In [None]:
best_tradeoff(N_Spearman, min_pairs=10, max_pairs=20, step=1)

In [None]:
plot_multiple_cdfs(rho_df,'Tanana')

### **3.** Spearman's coefficient attached to shapefiles

In [None]:
river_shp = call_file()

In [None]:
river_shp

In [None]:
# Function to join a geodataframe (gdf) with a daframe (df) as a new geodataframe. Arguments: gdf, df, reach_id or node_id
river_shp_Sp = geojoin(river_shp,rho_df,'node_id')

In [None]:
export_dataframe(river_shp_Sp,is_geodataframe=True)

### **4.** Scattter plots
There is a second optional argument to set the minimum Spearman coefficient from which the scatterplots should be plotted.

In [None]:
hypsometric(rivers_budget, min_spearman=None, min_obs=10, show_p_value=True)

In [None]:
river_shape8 = call_file()

In [None]:
river_shape4 = call_file()

In [None]:
# admits up to 3 arguments (Shapefiles)
profiles(river_shp)

### **4.** CDF of river width variability

In [None]:
# Function to plot the CDF of the Coefficient of Variation of river widths for each node
width_CV_cdf(river_shp, node_col='node_id', width_col='width', title='CDF of River Width Variability in Atrato River')

In [None]:
profile_cv(river_shp)

In [None]:
# Calculate the Coefficient of Variation (CV) for width and wse
scatter_cv_width_wse_with_spearman(river_shp)

In [None]:
profile_cv_w_W(river_shp)

In [None]:
plot_w_variability_cdfs(river_shp, node_col='node_id', width_col='width', cv_threshold=1.0, min_observations=10)