<table class="tfo-notebook-buttons" align="left">
    <tr>
        <td>
            <a target="_blank" href="https://github.com/entropyx/murray">
                <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" alt="GitHub Logo" width="32" height="32" />
                View source on GitHub
            </a>
        </td>
        <td>
            <a target="_blank" href="https://colab.research.google.com/github/entropyx/murray/blob/folder-structure/notebooks/Murray_Walkthrough.ipynb">
                <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Abrir en Colab" />
            </a>
        </td>
    </tr>
</table>


# **Getting Started with Murray**

In this practical guide we will make use of the Murray library, from its installation to the final results. Examples will be included for a better understanding. The guide divide in three parts:

- Install
- Experimental design
- Experimental evaluation



The experimental design have the following steps:



1. Upload data
2. Configure experimental design
3. Results


The experimental evaluation have the following steps:

1.   Upload data
2.   Configure and run experimental evaluation
3.   Results



## Step 0: Install

Before starting with the best part, you must first install the Murray package.

In [None]:
!pip install git+https://github.com/entropyx/murray.git

Once the package is installed in your environment, you should import the package along with its functions. A simple way to do this is as follows:

In [None]:
import pandas as pd
from Murray import cleaned_data
from Murray import plot_geodata,plot_impact_graphs,plot_impact_graphs_evaluation,plot_metrics,plot_permutation_test
from Murray import print_locations,print_weights,print_incremental_results,print_incremental_results_evaluation
from Murray import run_geo_analysis,run_geo_evaluation

## Step 1: Experimental desing

### Upload data

Load your data in csv format

In [None]:
data = pd.read_csv("data.csv")

1\. In order to continue it is necessary to have the data “cleaned”, that is to say, not to have irregular or missing values, for this you can choose to make sure that your loaded data has the requirements or you can use the function ```cleaned_data``` of Murray, which performs all this process, in this case the function will be used.



2\. To use the function it is necessary to enter three parameters: the name of the target variable column, the name of the column where the locations are located and the name of the date column.

In [None]:
data = cleaned_data(data,col_target='Sessions',col_locations='Region',col_dates='Date')

The function will clean the data of irregularities. After, you can ue the ```plot_geodata``` function to see the data.

In [None]:
plot_geodata(data)

### Configure experimental desing

Setting the configuration parameters is easy and you only need to log in:  ```data```, ```exclueded_data```, ```minimum_holdout_percentage```, ```significance_level```, ```deltas_range``` and ```periods_range```. These parameters are configured on a per-user basis. The parameters are few and very intuitive, this way the use of the package becomes fast, while Murray takes care of everything else..

In [None]:
geo_desing = run_geo_analysis(
    data=data,
    excluded_locations=['mexico city', 'state of mexico'],
    maximum_treatment_percentage=0.30,
    significance_level=0.10,
    deltas_range= (0.01, 0.25, 0.01),
    periods_range=(5,30, 5)
)

The results of the test provide us a visualization about the sensitivity in all periods admitted and differents holdouts

### Results

Once the heatmap is displayed you can choose the best configuration for you, after that you can use the following functions to display the experiment results, such as the treatment and control locations, as well as metrics like MAE (Mean Absolute Error) and MAPE (Mean Absolute Percentage Error).

The first function is to obtain the states that make up the treatment and control groups. You can define a variable for the holdout or simply set the numerical value for the functions.

In [None]:
holdout = 80 #it's a example, you need change for your results

###### Treatment ando control groups

In [None]:
print_locations(geo_desing,holdout_percentage=holdout)

###### Weights

To display a dataframe of how each control state influenced the construction of the actual constrafactual you can use the following function

In [None]:
print_weights(geo_desing,holdout_percentage=holdout)

NOTE: There are some negative values, however these states basically have a very small influence.

###### Impact graphs

Murray can show you a graph of the experiment, the chance effect and the cumulative effect so you can see graphically how it behaves. You need the followings parameters: ```geodata```, ```period``` and ```holdout_percentage```.

In [None]:
plot_impact_graphs(geo_desing,period=20,holdout_percentage=holdout)

###### Incremental results

Metrics such as ATT and total count can also be printed according to the selected configuration. The parameters required to use the function are the same as the previous one ( ```geodata```, ```period``` and ```holdout_percentage```).

In [None]:
print_incremental_results(geo_desing,period=20,holdout_percentage=holdout)

###### Metrics (MAPE and MAE)

The choice of the best configuration should also take into account having a group of locations that best represent the treatment group, with the following function you can obtain graphs of the MAE and MAPE metrics of the different groups that were chosen. You only need to enter one parameter.

In [None]:
plot_metrics(geo_desing)

## Step 3: Experimental evaluation

In this section Murray will be used to analyze data where a treatment has already been applied. As in the design, the first thing to do is to load and read the data.

### Upload data

In [None]:
data = pd.read_csv("data_marketing_campaign.csv")

Then, as explained above, if necessary, you can use the ```cleaned_data``` function to have the proper structure.

In [None]:
data = cleaned_data(data,col_target='Sessions',col_locations='Region',col_dates='Date')

Now, you can visualize the data if required.

In [None]:
plot_geodata(data)

### Configure experimental evaluation

In order to run the function that performs the evaluation, you only need to enter the following parameters: ```data```,```start_treatment```,```end_treatment```, ```treatment_group``` and ```spend```. 

Note: We must ensure that the dates are written in the DD/MM/YYYY format

In [None]:
treatment_group = ['baja california', 'chiapas', 'chihuahua', 'colima', 'guerrero', 'hidalgo', 'michoacan', 'oaxaca', 'san luis potosi', 'tamaulipas', 'texas']

In [None]:
results_evaluation = run_geo_evaluation(data=data,start_treatment='01-12-2024',end_treatment='31-12-2024',treatment_group=treatment_group,spend=25000)

### Results

Once the evaluation function is executed, you can display different results with the following functions.

###### Impact graphs

To obtain the impact graphs you only need to enter the ```results_evaluation``` parameter to the following function.

In [None]:
plot_impact_graphs_evaluation(results_evaluation=results_evaluation)

##### Incremental results

To print the ATT value, the total increment, and either iROAS or iCPA, you need the same parameter from the previous function, but this time also entering the value you want to calculate, either iROAS or iCPA.

In [None]:
print_incremental_results_evaluation(results_evaluation,'iCPA')

###### Permutation test

Finally, to observe in a graph the hypothesis test that was performed you should use the following function.

In [None]:
plot_permutation_test(results_evaluation)