## Gold analysis
Se il massimo di mercoledi, e' inferiore a quello di lunedi, vedrai il minimo di mercoledi rivisitato giovedi (rivisitato=giovedi il prezzo crossa dall'alto o dal basso il minimo di mercoledi)

### Step da fare:
- Calcolare dei massimi e minimi giornalieri
    - bisogna fare resampling dei dati (in modo da ottenere candele da 1 giorno)
- Categorizzare ogni giorno della settimana (bisogna creare una colonna che dice che giorno e')
- Controllare se il max di mercoledi e' inferiore a quello di lunedi'
    - salva il minimo di mercoledi'
    - controlla se il range di giovedi include il minimo di mercoledi'.

### Reading the CSV file and converting it to a parquet one: (if needed)

In [1]:
# #import the libraries
# import cudf
# import dask
# import dask.dataframe as dd
# #set the enviroment to cuDF so we use the GPU
# dask.config.set({"dataframe.backend": "cudf"})
# #----------------------------------------------

# xau1D = dd.read_csv('/home/edoardocame/Desktop/python_dir/xauusd-d1-bid-2014-01-01-2024-12-11T23.csv')
# xau1D['timestamp'] = dd.to_datetime(xau1D['timestamp'])
# xau1D = xau1D.set_index('timestamp', sorted=True)
# xau1D['weekday'] = xau1D.index.to_series().dt.weekday
# xau1D.head()

### Using parquet file:

In [1]:
#import the libraries
import dask
import dask.dataframe as dd
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)
client
#----------------------------------------------
dask.config.set({"dataframe.backend": "cudf"})


df = dd.read_parquet('/home/edoardocame/Desktop/python_dir/xauusd1D.parquet')
df['returns'] = df['close'].diff() / df['close'].shift(1)
df['week'] = df.index.dt.isocalendar().week
df['year'] = df.index.dt.isocalendar().year
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,weekday,returns,week,year
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2014-01-01,1203.612,1205.883,1202.302,1205.883,0.2705,2,,1,2014
2014-01-02,1205.913,1230.773,1204.893,1223.71,27.3592,3,0.014783358,1,2014
2014-01-03,1223.687,1240.153,1223.297,1236.683,26.3572,4,0.010601368,1,2014
2014-01-05,1236.983,1238.353,1233.842,1234.042,0.328,6,-0.002135551,1,2014
2014-01-06,1234.042,1248.342,1214.626,1237.665,26.1419,0,0.002935881,2,2014


In [2]:
# Create separate dataframes for each day we need
# Group by year and week, then get the first occurrence (should be only one per day anyway)
monday_data = df[df['weekday'] == 0].groupby(['year', 'week'])['high'].first()
wednesday_data = df[df['weekday'] == 2].groupby(['year', 'week'])[['high','low']].first()
thursday_data = df[df['weekday'] == 3].groupby(['year', 'week'])[['high','low']].first()

weekly_analysis = dd.concat([monday_data.rename('mon_high'), wednesday_data.rename(columns={'high':'wed_high', 'low':'wed_low'}), thursday_data], axis=1)
weekly_analysis.head()

We're assuming that the indices of each dataframes are 
 aligned. This assumption is not generally safe.


Unnamed: 0_level_0,Unnamed: 1_level_0,mon_high,wed_high,wed_low,high,low
year,week,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014,1,,1205.883,1202.302,1230.773,1204.893
2014,2,1248.342,1231.888,1218.423,1231.463,1223.475
2014,3,1255.042,1244.012,1234.242,1245.198,1236.328
2014,4,1258.205,1243.557,1235.542,1265.433,1231.523
2014,5,1276.438,1270.142,1250.883,1267.677,1238.043


In [3]:
weekly_analysis['wed_lower_than_mon'] = weekly_analysis['wed_high'] < weekly_analysis['mon_high']

weekly_analysis['thurs_crosses_wed'] = (
    (weekly_analysis['wed_lower_than_mon']) & 
    (weekly_analysis['low'] <= weekly_analysis['wed_low']) & 
    (weekly_analysis['high'] >= weekly_analysis['wed_low'])
)

weekly_analysis.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,mon_high,wed_high,wed_low,high,low,wed_lower_than_mon,thurs_crosses_wed
year,week,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014,1,,1205.883,1202.302,1230.773,1204.893,,
2014,2,1248.342,1231.888,1218.423,1231.463,1223.475,True,False
2014,3,1255.042,1244.012,1234.242,1245.198,1236.328,True,False
2014,4,1258.205,1243.557,1235.542,1265.433,1231.523,True,True
2014,5,1276.438,1270.142,1250.883,1267.677,1238.043,True,True


In [5]:
eventi = weekly_analysis['thurs_crosses_wed'].sum().compute()
osservazioni = len(weekly_analysis['thurs_crosses_wed'])
print(f"Su un totale di {osservazioni} settimane, {eventi} si e' verificato l'evento")

Su un totale di 572 settimane, 153 si e' verificato l'evento


In [6]:
client.shutdown()

# In depth code explanation:


1. **Initial Setup and Data Loading**


In [None]:
import dask
import dask.dataframe as dd
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

# Sets up a local CUDA cluster for GPU computations
cluster = LocalCUDACluster()
client = Client(cluster)
dask.config.set({"dataframe.backend": "cudf"})

This section initializes a GPU-accelerated environment using CUDA. Dask is used for parallel computing, and cuDF (GPU DataFrame) is set as the backend.

2. **Data Preparation**


In [None]:
df = dd.read_parquet('/home/edoardocame/Desktop/python_dir/xauusd1D.parquet')
df['returns'] = df['close'].diff() / df['close'].shift(1)  # Calculate daily returns
df['week'] = df.index.dt.isocalendar().week    # Extract week number
df['year'] = df.index.dt.isocalendar().year    # Extract year

The code reads a Parquet file containing gold price data and adds three new columns:
- `returns`: Daily price returns (percentage change)
- `week`: Week number of the year
- `year`: Year

3. **GroupBy Logic (The Core Analysis)**


In [None]:
# Create separate dataframes for each day
monday_data = df[df['weekday'] == 0].groupby(['year', 'week'])['high'].first()
wednesday_data = df[df['weekday'] == 2].groupby(['year', 'week'])[['high','low']].first()
thursday_data = df[df['weekday'] == 3].groupby(['year', 'week'])[['high','low']].first()



Let's break down this groupby logic in detail:

a) **First Filter**: `df[df['weekday'] == X]`
   - Filters rows for specific days (0=Monday, 2=Wednesday, 3=Thursday)
   - Creates subsets containing only data for those specific days

b) **GroupBy Operation**: `.groupby(['year', 'week'])`
   - Groups the filtered data by both year and week
   - Creates nested groups where each group represents a specific week in a specific year
   - Example structure:
     ```
     2014, Week 1 → [Monday data for this week]
     2014, Week 2 → [Monday data for this week]
     2015, Week 1 → [Monday data for this week]
     ```

c) **Aggregation**: `.first()`
   - Takes the first record from each group
   - Since we filtered by specific days, this gives us the values for that specific day in each week

4. **Data Combination and Analysis**


In [None]:
weekly_analysis = dd.concat([
    monday_data.rename('mon_high'), 
    wednesday_data.rename(columns={'high':'wed_high', 'low':'wed_low'}), 
    thursday_data
], axis=1)

This combines the grouped data into a single DataFrame where each row represents a week, containing:
- Monday's high
- Wednesday's high and low
- Thursday's high and low

5. **Pattern Analysis**


In [None]:
weekly_analysis['wed_lower_than_mon'] = weekly_analysis['wed_high'] < weekly_analysis['mon_high']

weekly_analysis['thurs_crosses_wed'] = (
    (weekly_analysis['wed_lower_than_mon']) & 
    (weekly_analysis['low'] <= weekly_analysis['wed_low']) & 
    (weekly_analysis['high'] >= weekly_analysis['wed_low'])
)

This creates two boolean columns:
- `wed_lower_than_mon`: True if Wednesday's high is lower than Monday's high
- `thurs_crosses_wed`: True if Thursday's price range crosses Wednesday's low (indicating a pattern confirmation)

6. **Results Analysis**


In [None]:
eventi = weekly_analysis['thurs_crosses_wed'].sum().compute()
osservazioni = len(weekly_analysis['thurs_crosses_wed'])

Finally, it counts how many times the pattern occurred (`eventi`) out of the total number of weeks observed (`osservazioni`).

The GroupBy operation is particularly powerful here because it:
1. Organizes data into meaningful weekly segments
2. Maintains the year-week relationship
3. Allows easy extraction of specific day's values within each week
4. Enables efficient pattern matching across different days of the week
5. Handles missing data automatically (weeks without trading days)

The code effectively identifies a specific trading pattern where Wednesday's high is lower than Monday's high, and Thursday's price range crosses Wednesday's low, which could potentially be used for trading strategies.