# Event Analysis
This jupyter notebook will show you how to use the Event Analysis Library
You can install the library by running 
```
pip install event-analysis
```
If you have an Nvidia GPU and you want to accelerate your workload using the GPU, you will need to install PyCuda which you can by running 
```
pip install pycuda
```
The library provides functions for Event Synchronisation and Event Coincidence Analysis. Read the documentation if you want to learn more about them.

In [None]:
!pip install event-analysis

To use the library you first need to create a pandas dataframe. The dataframe must satisfy 2 properties
<ol>
    <li>The index must be an increasing time series of python datetime objects*
    <li>The internal data of the dataframe must be boolean 
</ol>
The columns will be the name of the event series

*1) Internally the library will store the timeseries in an int32 array such that 
```
    timeseries[i] = timeseries[0] + deltatime(hour = array[i])
```
This means if your timeseries has time difference's which are not multiple of an hour then the algorithm will give wrong results. 
If your time series spans over 65 years i.e 
```
    (timeseries[-1] - timeseries[0]).years > 65
```
then an integer overflow will cause the algorithm to give wrong results. If your specific use case meets either of these conditions please open an issue, or better yet fix them yourself and open a pull request! Contributions are much appritiated. 

Inside the library I have written helper function to convert my data into this format. You will have to write your own.

In [19]:
from helperfunction import numpy_from_csv_data, get_Df_From_numpy

In [17]:
import geopandas as gpd
import datetime
import numpy as np
india_mp_gdf = gpd.read_file("/home/rkumar/Documents/climate_paper_recreate/plotting/shapefile/Ind.shp")
map_shape_object = india_mp_gdf.geometry
file_name = "/home/rkumar/Documents/climate_paper_recreate/trmm_3hrs_precp_2003_2019.csv"
numpy_mat, missing_vals_index_set, viable_columns = numpy_from_csv_data(file_name, map_shape_object)
starting_date = datetime.datetime(2003,1,1,hour=0,minute=0,second=0,microsecond=0)
time_delta = datetime.timedelta(hours=3)
rainDataDf = get_Df_From_numpy(numpy_mat,  viable_columns, starting_date = starting_date, time_delta = time_delta)
monsoonDataDf = rainDataDf[rainDataDf.index.month >= 6 | (rainDataDf.index.month <=9) ]
monsoonEventDf = monsoonDataDf > np.percentile(monsoonDataDf.to_numpy(copy = True), 98, method = "lower", overwrite_input = True)

Dropping Not Point elements / Points outside shape
Number of elements dropped :: 45002
Starting main Loop :: Remove Missing Values


14384it [00:26, 542.39it/s]


Number of values dropped :: 115526


I am going to be analysing the rainfall events over the Indian sub-continent at 3 hour different from the year 2003 to 2007.

In [18]:
monsoonEventDf.head()

Unnamed: 0_level_0,X93.875Y6.875,X93.625Y7.375,X77.375Y8.125,X77.125Y8.375,X77.375Y8.375,X77.625Y8.375,X77.875Y8.375,X76.875Y8.625,X77.125Y8.625,X77.375Y8.625,...,X74.875Y36.625,X75.125Y36.625,X75.375Y36.625,X75.625Y36.625,X73.875Y36.875,X74.375Y36.875,X74.625Y36.875,X74.875Y36.875,X75.125Y36.875,X75.375Y36.875
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-07-01 00:00:00,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2003-07-01 03:00:00,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2003-07-01 06:00:00,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2003-07-01 09:00:00,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
2003-07-01 12:00:00,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [20]:
EA_object = EventAnalysis(monsoonEventDf)

## Event Synchronization

In [None]:
EA_object.ES()