# Tables description

## Overview
This document explains how each table is composed, and provides a test to verify the relation between session and session executions

**Author**: Oscar Javier Bastidas Jossa. 

**Email**: oscar.jossa@deusto.es.


Across the document you can find the following notation, which is explained below:

**Table**_<column_on_table>

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

## session_executions

In [2]:

session_executions = pd.read_csv('data/session_executions.csv', on_bad_lines='skip', low_memory=False, header = None)
session_executions.columns = ['id', 'scheduled_at', 'user_program_id', 
                              'difficulty_feedback', 'enjoyment_feedback',
                             'feedback_comment', 'reps_executed',
                             'execution_time', 'order', 'created_at',
                             'updated_at', 'front_end_id', 'session_id',
                             'discarded', 'discard_reason', 'imported']
session_executions2 = session_executions.drop(['scheduled_at', 'feedback_comment',
                                              'order', 'created_at',
                                              'front_end_id', 'imported'], axis = 1)
session_executions2

Unnamed: 0,id,user_program_id,difficulty_feedback,enjoyment_feedback,reps_executed,execution_time,updated_at,session_id,discarded,discard_reason
0,4201,2016,7,4,600,\N,2021-08-11 11:36:19.785554,659,f,\N
1,4283,2272,\N,\N,\N,\N,2021-08-30 08:25:17.488913,536,t,4
2,4399,2393,4,3,144,\N,2021-09-10 06:31:18.888265,692,f,\N
3,4850,2791,7,5,576,\N,2021-09-29 08:00:27.507686,644,f,\N
4,4672,2447,5,3,174,\N,2021-09-21 08:30:05.658811,691,f,\N
...,...,...,...,...,...,...,...,...,...,...
738683,746511,75140,7,5,228,\N,2022-05-27 07:24:41.594127,552,f,\N
738684,746512,77311,5,3,200,\N,2022-05-27 07:25:29.25213,543,f,\N
738685,746513,27704,7,4,358,\N,2022-05-27 07:26:34.46005,601,f,\N
738686,746514,77078,5,3,291,\N,2022-05-27 07:38:41.315872,544,f,\N


It contains the session executed. The fields related with other tables are described below:

session_id: It containts the session executed. A session can have multiple session_blocks, but they can be identified with the **session_blocks2**_order and **session_blocks2**_block_type.

user_program_id: The user program id. A user can have multiple programs.

## sessions tables related

### sessions

A session comprises **session_blocks**, each of which is further subdivided into **session_sets**. These **session_sets**, in turn, consist of **exercise_sets**, and the **exercise_sets** are ultimately comprised of **exercises**.

In [3]:
# Table of sessions 
sessions = pd.read_csv('data/sessions.csv', on_bad_lines='skip', low_memory=False)
sessions2 = sessions.drop(['level', 'reps', 'created_at', 'updated_at', 'strength', 
                           'endurance', 'technique', 'flexibility', 'intensity',
                           'name_es','description_en', 'description_es'], axis = 1)
sessions2

Unnamed: 0,id,order,session_type,time_duration,code_name,name_en,calories,warmup_id,cooldown_id
0,77,5.0,,826.0,PH1-2-5,CPH,137.0,,
1,105,3.0,,1200.0,PH3-2-3,IRP,316.0,,
2,75,3.0,,720.0,PH1-2-3,IRP,174.0,,
3,85,3.0,,651.0,PH2-1-3,IR,369.0,,
4,97,10.0,,738.0,PH2-2-10,SR,236.0,,
...,...,...,...,...,...,...,...,...,...
1558,1919,14.0,,,Sesion 14 - MyHixel,Session 14,,1905.0,1906.0
1559,1936,8.0,,,Aurum - Sesion 8,Session 8,,,
1560,1937,9.0,,,Aurum - Sesion 9,Session 9,,,
1561,1938,10.0,,,Aurum - Sesion 10,Session 10,,,


### session_blocks

A session is composed of session_blocks. 

In [4]:

session_blocks = pd.read_csv('data/session_blocks.csv', on_bad_lines='skip', low_memory=False, header = None)
session_blocks.columns = ['id', 'session_id', 'time_duration', 
                              'created_at', 'updated_at',
                              'order', 'block_type', 'loop']
session_blocks2 = session_blocks.drop(['time_duration', 'created_at', 'updated_at',
                                       'loop'], axis = 1)
session_blocks2

Unnamed: 0,id,session_id,order,block_type
0,62,41,1,0
1,3239,1849,1,19
2,3240,1850,1,19
3,3241,1851,1,19
4,5,8,1,0
...,...,...,...,...
2981,3226,1836,1,19
2982,3227,1837,1,19
2983,3228,1838,1,19
2984,3229,1839,1,19


a session_block can be refered to the same session, but with different order as seen in the table below.

In [5]:
session_blocks2.loc[session_blocks2['session_id']==923] # A session can have multiple session_blocks with diferent order

Unnamed: 0,id,session_id,order,block_type
2118,2251,923,1,9
2119,2252,923,2,9
2120,2253,923,3,9
2121,2254,923,4,9
2122,2255,923,5,0
2123,2256,923,6,9
2124,2257,923,7,9
2125,2258,923,8,9
2126,2259,923,9,9


### session_sets

A session_block is composed of session_sets. A session_set can be composed of the same **session_blocks**_session_block_id with diferent order as seen in the table below.

In [6]:

session_sets = pd.read_csv('data/session_sets.csv', on_bad_lines='skip', low_memory=False)
session_sets2 = session_sets.drop(['level', 'time_duration', 'reps', 
                                   'session_set_type','created_at', 'updated_at',
                                   'loop'], axis = 1)
session_sets2

Unnamed: 0,id,order,session_block_id
0,245,1,67
1,246,2,67
2,247,3,67
3,248,4,67
4,5,1,5
...,...,...,...
11540,12706,5,3357
11541,12707,6,3357
11542,12708,7,3357
11543,12709,8,3357


### exercise_sets

A session_set_id have multiples exercises_id, and an exercise_id can belong to multiples session_set_id (relation many to many). What it is importan here, is the order of the exercise. 

In [7]:
exercise_sets = pd.read_csv('data/exercise_sets.csv', on_bad_lines='skip', low_memory=False)
exercise_sets2 = exercise_sets.drop(['intensity_modificator', 'track_reps'], axis = 1)
exercise_sets2

Unnamed: 0,id,session_set_id,exercise_id,order,time_duration,reps,created_at,updated_at
0,1,5,5289,1,0.0,15.0,2020-10-23 13:53:44.337776,2020-10-23 13:53:44.337776
1,2,5,5218,2,0.0,30.0,2020-10-23 13:53:44.351294,2020-10-23 13:53:44.351294
2,3,5,5245,3,0.0,20.0,2020-10-23 13:53:44.365517,2020-10-23 13:53:44.365517
3,4,5,5293,4,0.0,5.0,2020-10-23 13:53:44.378357,2020-10-23 13:53:44.378357
4,5,6,5289,1,0.0,15.0,2020-10-23 13:53:44.411764,2020-10-23 13:53:44.411764
...,...,...,...,...,...,...,...,...
40833,42776,12710,5236,4,0.0,3.0,2022-03-14 19:42:03.209737,2022-03-14 19:42:03.209737
40834,18595,5774,5613,1,0.0,10.0,2021-03-10 18:13:55.496974,2022-03-31 23:19:01.27863
40835,18597,5775,5613,1,0.0,10.0,2021-03-10 18:13:55.530111,2022-03-31 23:19:01.288658
40836,18599,5776,5613,1,0.0,10.0,2021-03-10 18:13:55.567803,2022-03-31 23:19:01.294246


As we can see in the example below the session_set_id = 5, has 4 exercises with diferent order (exercise_id = 5289, order = 1, exercise_id = 5218, order = 2, exercise_id = 5245, order = 3, exercise_id = 5293, order = 4). As we previosly said, those exercises can belong to other session_set_id

In [8]:
exercise_sets2.loc[exercise_sets2['session_set_id'] == 5]

Unnamed: 0,id,session_set_id,exercise_id,order,time_duration,reps,created_at,updated_at
0,1,5,5289,1,0.0,15.0,2020-10-23 13:53:44.337776,2020-10-23 13:53:44.337776
1,2,5,5218,2,0.0,30.0,2020-10-23 13:53:44.351294,2020-10-23 13:53:44.351294
2,3,5,5245,3,0.0,20.0,2020-10-23 13:53:44.365517,2020-10-23 13:53:44.365517
3,4,5,5293,4,0.0,5.0,2020-10-23 13:53:44.378357,2020-10-23 13:53:44.378357


The example below shows how a exercise_id == 5289 belongs to different session_set_id.

In [9]:
exercise_sets2.loc[exercise_sets2['exercise_id'] == 5289].head(15)

Unnamed: 0,id,session_set_id,exercise_id,order,time_duration,reps,created_at,updated_at
0,1,5,5289,1,0.0,15.0,2020-10-23 13:53:44.337776,2020-10-23 13:53:44.337776
4,5,6,5289,1,0.0,15.0,2020-10-23 13:53:44.411764,2020-10-23 13:53:44.411764
8,9,7,5289,1,0.0,15.0,2020-10-23 13:53:44.537815,2020-10-23 13:53:44.537815
12,13,8,5289,1,0.0,15.0,2020-10-23 13:53:44.658651,2020-10-23 13:53:44.658651
16,17,9,5289,1,0.0,15.0,2020-10-26 15:52:11.973966,2020-10-26 15:52:11.973966
20,21,10,5289,1,0.0,15.0,2020-10-26 15:52:12.041107,2020-10-26 15:52:12.041107
24,25,11,5289,1,0.0,15.0,2020-10-26 15:52:12.107468,2020-10-26 15:52:12.107468
28,29,12,5289,1,0.0,15.0,2020-10-26 15:52:12.178784,2020-10-26 15:52:12.178784
32,33,13,5289,1,0.0,15.0,2020-10-26 15:54:14.768327,2020-10-26 15:54:14.768327
103,104,41,5289,2,,10.0,2020-10-26 15:56:15.22282,2020-10-26 15:56:15.22282


### exercises

This table contains the catalog of exercises

In [10]:
# Table of exercises
exercises = pd.read_csv('data/exercises.csv', sep = ';', on_bad_lines='skip', low_memory=False)
exercises2 = exercises.drop(['video','reps', 'time','legacy_id','deprecated', 
                             'replacement_legacy_id', 'family', 'sub_family',
                             'video_female', 'video_male', 'harder_variation_id',
                             'easier_variation_id', 'name_es', 'description_en',
       'description_es', 'implement_variation_id', 'test_correction',
       'thumbnail', 'thumbnail_male', 'thumbnail_female', 'notes_en',
       'notes_es', 'execution_time', 'thumbnail_400', 'thumbnail_400_male',
       'thumbnail_400_female', 'coach_id', 'test_equivalent_id', 't1_min',
       't1_max', 'excluded'], axis = 1)
exercises2.head(10)

Unnamed: 0,id,created_at,updated_at,body_parts_focused,muscles,joints,met_multiplier,name_en
0,5551,2020-10-15 12:37:19.622509,2021-08-24 13:43:14.829103,"{""Todo el cuerpo""}","{isquiotibiales,"" erector de la columna"","" dor...",{cadera},2.3,Straddle split
1,5528,2020-10-15 12:37:19.379916,2021-08-24 13:43:14.852983,{Piernas},"{cuádriceps,"" isquiotibiales"","" glúteos""}",{cadera},2.3,Side leg swing (left)
2,5216,2020-10-15 12:37:15.688677,2021-10-13 09:46:28.673338,{Core},"{""erector de la columna"","" recto mayor del abd...","{hombros,"" tobillos""}",2.5,Plank
3,5706,2020-10-15 12:37:21.440337,2021-09-22 17:13:47.654393,"{Brazos,Core}","{pectorales,"" dorsales"","" bíceps""}","{hombros,"" codos"","" muñecas""}",3.8,Archer row 2
4,5702,2020-10-15 12:37:21.397943,2021-09-22 17:13:47.669895,"{Espalda,Brazos}","{deltoides,"" tríceps"","" dorsales""}","{codos,"" hombros"","" muñecas""}",3.2,Chest fly 2
5,5802,2020-10-15 12:37:22.545302,2021-06-28 06:01:42.658248,{Brazos},{gluteos},{cadera},,Biking
6,5700,2020-10-15 12:37:21.369872,2021-09-22 17:13:47.677487,"{Core,Brazos}","{bíceps,"" recto mayor del abdomen"","" pectorales""}","{hombros,"" codos"","" muñecas""}",2.8,Bicep curl 2
7,5695,2020-10-15 12:37:21.318845,2021-09-22 17:13:47.684775,"{Core,Brazos}","{cuádriceps,"" glúteos""}","{cadera,"" rodillas""}",4.5,L-sit with rings 3
8,5691,2020-10-15 12:37:21.277469,2021-09-22 17:13:47.699902,"{Core,Brazos,Piernas}","{deltoides,"" recto mayor del abdomen"","" cuádri...","{hombros,"" cadera""}",3.8,Leg raises 3
9,5220,2020-10-15 12:37:15.743761,2021-10-13 09:46:28.718026,{Core},"{""erector de la columna"","" recto mayor del abd...","{cadera,"" hombros"","" tobillos"","" codos""}",2.8,Plank extension


## session_executions tables related

This tables are the homologous than sessions, but correspond to the actual executions when the users exercised.

### session_block_executions

The order are the same of session_blocks table (see the test example)

In [11]:
# Table of blocks of session executions
session_block_executions = pd.read_csv('data/session_block_executions.csv', on_bad_lines='skip', low_memory=False)
session_block_executions2 = session_block_executions.drop(['block_type', 
                                                           'reps_executed',
                                                           'execution_time', 
                                                           'created_at', 
                                                           'updated_at'], axis = 1)
session_block_executions2

Unnamed: 0,id,session_execution_id,order
0,7077,5564,1
1,5005,4665,1
2,5006,4665,2
3,5007,4665,3
4,5008,4665,4
...,...,...,...
207967,214308,764891,8
207968,214309,764892,1
207969,214310,764893,1
207970,214311,764894,1


### session_set_executions

The order are the same of session_sets table (see the test example)

In [12]:

session_set_executions = pd.read_csv('data/session_set_executions.csv', on_bad_lines='skip', low_memory=False)
session_set_executions2 = session_set_executions.drop(['reps_executed', 'execution_time',
                                                       ], axis = 1)
session_set_executions2

Unnamed: 0,id,order,created_at,updated_at,session_block_execution_id
0,43841,1,2021-10-15 14:50:10.1699,2021-10-15 14:50:10.1699,6205
1,43842,2,2021-10-15 14:50:10.202496,2021-10-15 14:50:10.202496,6205
2,43843,3,2021-10-15 14:50:10.237671,2021-10-15 14:50:10.237671,6205
3,43844,4,2021-10-15 14:50:10.270587,2021-10-15 14:50:10.270587,6205
4,43845,1,2021-10-15 14:50:10.306125,2021-10-15 14:50:10.306125,6206
...,...,...,...,...,...
849284,889697,4,2022-09-12 08:16:06.597768,2022-09-12 08:16:06.597768,214312
849285,889698,5,2022-09-12 08:16:06.61754,2022-09-12 08:16:06.61754,214312
849286,889699,6,2022-09-12 08:16:06.637444,2022-09-12 08:16:06.637444,214312
849287,889700,7,2022-09-12 08:16:06.657183,2022-09-12 08:16:06.657183,214312


### exercise_executions

This table contains the same exercise_id than the **exercise_sets**_exercise_id. The order is also the same than exercise_sets.

In [13]:
# Table of exercise executions 
exercise_executions = pd.read_csv('data/exercise_executions.csv', on_bad_lines='skip', low_memory=False, header = None)
exercise_executions.columns = ['id', 'exercise_id', 'session_set_execution_id', 
                              'reps_executed', 'execution_time',
                              'order', 'created_at', 'updated_at']
exercise_executions

Unnamed: 0,id,exercise_id,session_set_execution_id,reps_executed,execution_time,order,created_at,updated_at
0,1279660,5236,47654,10,66,1,2021-10-29 13:07:01.918992,2021-10-29 13:07:01.918992
1,1279661,5968,47654,0,15,2,2021-10-29 13:07:01.924569,2021-10-29 13:07:01.924569
2,1279662,5317,47654,10,11,3,2021-10-29 13:07:01.92973,2021-10-29 13:07:01.92973
3,1279663,5968,47654,0,15,4,2021-10-29 13:07:01.934805,2021-10-29 13:07:01.934805
4,1279664,5222,47654,10,19,5,2021-10-29 13:07:01.940808,2021-10-29 13:07:01.940808
...,...,...,...,...,...,...,...,...
2190284,3116554,5968,669974,0,31,2,2022-05-27 07:41:53.378292,2022-05-27 07:41:53.378292
2190285,3116555,5870,669975,6,31,1,2022-05-27 07:41:53.386738,2022-05-27 07:41:53.386738
2190286,3116556,5968,669975,0,31,2,2022-05-27 07:41:53.390694,2022-05-27 07:41:53.390694
2190287,3116557,5870,669976,6,24,1,2022-05-27 07:41:53.399237,2022-05-27 07:41:53.399237


# Test of sessions and session executions
This test is designed to check if the id of session tables related and session_execution tables related are the same. To verify if the relations are ok the **exercises**_id should be the same in the tables **exercise_executions**_exercise_id and **exercise_sets**_exercise_id.

In [14]:
session_id = 685 # Id of the session executed
session_execution_id = 5612  # the session could be executed multiple times, so this id correspond to one of the multiple executions
session_block_order = 1 # Order of the block
sesion_set_order = 1 # order of the set

In [15]:
session_executions2.loc[session_executions2["session_id"] == session_id]

Unnamed: 0,id,user_program_id,difficulty_feedback,enjoyment_feedback,reps_executed,execution_time,updated_at,session_id,discarded,discard_reason
148,5612,5282,4,4,368,\N,2021-10-29 18:47:43.036039,685,f,\N
238,8060,4523,5,3,368,\N,2021-11-08 19:07:10.71468,685,f,\N
525,8962,34644,5,3,368,\N,2021-11-11 07:58:14.762,685,f,\N
572,7092,37090,5,3,368,\N,2021-11-05 03:46:34.865995,685,f,\N
659,7107,26439,3,4,368,\N,2021-11-05 05:53:30.780207,685,f,\N
...,...,...,...,...,...,...,...,...,...,...
736851,744678,77443,7,3,365,\N,2022-05-19 17:46:54.351867,685,f,\N
737253,745080,71267,5,3,365,\N,2022-05-21 16:11:40.914869,685,f,\N
737797,745624,77729,5,3,365,\N,2022-05-24 02:46:13.834661,685,f,\N
738016,745843,77635,8,2,365,\N,2022-05-24 18:27:46.577194,685,f,\N


In [16]:
sessions2.loc[sessions2['id'] == session_id]

Unnamed: 0,id,order,session_type,time_duration,code_name,name_en,calories,warmup_id,cooldown_id
30,685,2.0,,1105.0,PH1.2_V2,Session 2,,,


In [17]:
session_block_execution_id = session_block_executions2.loc[(session_block_executions2['session_execution_id'] == session_execution_id) &
                                                           (session_block_executions2['order'] == session_block_order), 'id'].values[0] 
print(session_block_execution_id)
session_block_executions2.loc[session_block_executions2['session_execution_id'] == session_execution_id]

7191


Unnamed: 0,id,session_execution_id,order
572,7191,5612,1
573,7192,5612,2
574,7193,5612,3


In [18]:
session_block_id = session_blocks2.loc[(session_blocks2['session_id'] == session_id) &
                    (session_blocks2['order'] == str(session_block_order)), 'id'].values[0] 
print(session_block_id)
session_blocks2.loc[session_blocks2['session_id'] == session_id]


1715


Unnamed: 0,id,session_id,order,block_type
1585,1715,685,1,18
1586,1716,685,2,18
1587,1717,685,3,18


In [19]:
session_set_executions2_id = session_set_executions2.loc[(session_set_executions2['session_block_execution_id'] == session_block_execution_id) &
                                                         (session_set_executions2['order'] == sesion_set_order), 'id'].values[0]
print(session_set_executions2_id)
session_set_executions2.loc[session_set_executions2['session_block_execution_id'] == session_block_execution_id]

48147


Unnamed: 0,id,order,created_at,updated_at,session_block_execution_id
1540,48147,1,2021-10-29 18:47:42.583787,2021-10-29 18:47:42.583787,7191
1541,48148,2,2021-10-29 18:47:42.598412,2021-10-29 18:47:42.598412,7191
1542,48149,3,2021-10-29 18:47:42.611629,2021-10-29 18:47:42.611629,7191
1543,48150,4,2021-10-29 18:47:42.633059,2021-10-29 18:47:42.633059,7191
1544,48151,5,2021-10-29 18:47:42.646835,2021-10-29 18:47:42.646835,7191
1545,48152,6,2021-10-29 18:47:42.66008,2021-10-29 18:47:42.66008,7191
1546,48153,7,2021-10-29 18:47:42.673405,2021-10-29 18:47:42.673405,7191
1547,48154,8,2021-10-29 18:47:42.690885,2021-10-29 18:47:42.690885,7191


In [20]:
session_set_id = session_sets2.loc[(session_sets2['session_block_id'] == session_block_id) &
                                   (session_sets2['order'] == sesion_set_order), 'id'].values[0]
print(session_set_id)
session_sets2.loc[session_sets2['session_block_id'] == session_block_id]

7587


Unnamed: 0,id,order,session_block_id
6855,7587,1,1715
6856,7588,2,1715
6857,7589,3,1715
6858,7590,4,1715
6859,7591,5,1715
6860,7592,6,1715
6861,7593,7,1715
6862,7594,8,1715


In [21]:
exercise_executions.loc[exercise_executions['session_set_execution_id'] == session_set_executions2_id]

Unnamed: 0,id,exercise_id,session_set_execution_id,reps_executed,execution_time,order,created_at,updated_at
4926,1281063,5943,48147,30,67,1,2021-10-29 18:47:42.590858,2021-10-29 18:47:42.590858
4927,1281064,5968,48147,0,10,2,2021-10-29 18:47:42.59586,2021-10-29 18:47:42.59586


In [22]:
exercise_sets.loc[exercise_sets['session_set_id'] == session_set_id]

Unnamed: 0,id,session_set_id,exercise_id,order,intensity_modificator,time_duration,reps,created_at,updated_at,track_reps
21855,23501,7587,5968,2,,10.0,0.0,2021-06-08 00:11:03.262836,2021-06-08 00:11:03.262836,f
21902,23500,7587,5943,1,,0.0,30.0,2021-06-08 00:11:03.25158,2021-09-01 17:54:52.632046,f


In [23]:
exercises2.loc[exercises2['id'] == 5968] # Select based on the previous tables

Unnamed: 0,id,created_at,updated_at,body_parts_focused,muscles,joints,met_multiplier,name_en
500,5968,2020-10-16 09:33:15.172657,2022-01-21 19:15:56.401181,{},{0},{0},1.0,Rest


# end of the Test

### user_programs

This table illustrates the many-to-many relationship between users and programs. When a user exercises and records a session execution, the **session_execution**_user_program_id is stored in the session_execution table. Through this foreign key, the session_execution can extract information about the user_id and the program in which the user is enrolled.

In [25]:
user_programs = pd.read_csv('data/user_programs.csv', on_bad_lines='skip', low_memory=False)
user_programs2 = user_programs.drop(['enjoyment_notes'], axis = 1)
user_programs2

Unnamed: 0,id,user_id,program_id,created_at,updated_at,active,current_session_id,completed,enjoyment
0,52212,13050,5,2021-12-28 01:19:09.882505,2021-12-28 01:19:09.882505,f,288.0,f,
1,69411,15759,5,2022-02-14 19:09:00.841042,2022-02-14 19:09:00.841042,f,288.0,f,
2,2090,779,9,2021-07-21 13:48:09.456286,2021-07-21 13:48:09.456286,t,699.0,f,
3,52213,13050,36,2021-12-28 01:19:09.905606,2021-12-28 01:19:09.905606,f,634.0,f,
4,1510,597,14,2021-06-10 20:08:55.202574,2021-11-24 17:39:02.23625,t,726.0,f,
...,...,...,...,...,...,...,...,...,...
81316,48836,12063,428,2021-11-30 15:47:58.580229,2022-09-12 17:18:04.751141,f,1652.0,f,
81317,85188,12534,29,2022-09-12 20:26:10.878139,2022-09-12 20:26:10.878139,t,536.0,f,
81318,85189,19915,13,2022-09-12 21:49:34.709577,2022-09-12 21:49:34.709577,t,704.0,f,
81319,85190,20065,13,2022-09-12 21:53:12.717518,2022-09-12 21:53:12.717518,t,704.0,f,


### users

It contains all the information about the user

In [26]:
users = pd.read_csv('data/users.csv', low_memory=False)
users2 = users.drop(['email', 'encrypted_password', 
                     'reset_password_token','reset_password_sent_at',
                     'remember_created_at','is_admin','names', 'last_name',
                     'current_sign_in_ip', 'last_sign_in_ip', 
                     'recover_password_code','recover_password_attempts', 
                     'facebook_uid','workout_setting_voice_coach', 'workout_setting_sound',
                     'workout_setting_vibration', 'workout_setting_mobility',
                     'workout_setting_cardio_warmup', 'workout_setting_countdown',
                     'google_uid','t1_push','t1_core', 
                     't1_legs', 't1_full', 't1_push_exercise', 
                     't1_pull_up','t2_reps', 't2_steps', 
                     't2_reps_push', 't2_reps_core', 't2_reps_legs',
                     't2_reps_full', 't2_time_push', 't2_time_core',
                     't2_time_legs', 't2_time_full', 't1_full_exercise', 
                     't1_pull_up_exercise','warmup_setting', 
                     'warmup_session_id', 'stripe_id', 'provider', 'uid',
                     'affiliate_code', 'moengage_id', 'mix_panel_id',
                     'apple_id_token','platform', 'login_token',
                     'login_token_generated_at', 'imported',
                     'current_sign_in_at', 'last_sign_in_at', 'sign_in_count',
                    'current_weekly_streak'], 
                    axis = 1)
users2

Unnamed: 0,id,created_at,updated_at,gender,date_of_birth,height,weight,activity_level,goal,body_type,body_fat,newsletter_subscription,notifications_setting,training_days_setting,language,country,points,scientific_data_usage,best_weekly_streak,affiliate_code_signup,total_sessions,total_time,kcal_per_session,reps_per_session
0,1880,2021-10-25 11:02:55.764914,2021-12-22 06:39:38.014311,1,1982-08-26,185.0,105.5,2,0,1,40.0,t,t,3,es,ES,25884,f,0,,,,,
1,747,2021-07-09 18:42:40.52939,2021-12-22 06:39:37.955606,1,2000-01-01,160.0,60.0,1,1,0,20.0,t,t,4,es,,100,f,1,,1.0,74.0,254.0,209.0
2,3469,2021-10-28 06:01:37.777493,2021-12-22 06:39:38.051766,1,1963-12-31,180.0,105.5,1,0,2,35.0,t,t,3,es,ES,580,f,0,,,,,
3,1876,2021-10-25 11:02:54.597607,2021-12-22 06:39:38.074943,1,1977-03-03,184.0,118.0,1,0,2,35.0,t,t,3,es,MX,0,f,0,,,,,
4,1886,2021-10-25 11:02:57.34532,2022-04-28 23:52:27.486079,1,1979-04-22,173.0,75.6,2,1,1,40.0,t,t,3,es,ES,11014,f,0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18683,20062,2022-09-12 20:57:51.854812,2022-09-12 20:57:51.854812,1,1983-06-16,172.0,72.0,1,1,1,20.0,t,t,1,es,,0,f,0,,,,,
18684,20063,2022-09-12 21:05:34.949417,2022-09-12 21:05:34.949417,0,1985-06-05,165.0,50.0,0,2,0,20.0,t,t,1,es,,0,f,0,,,,,
18685,20065,2022-09-12 21:50:48.184245,2022-09-12 21:53:03.164483,1,2008-02-24,162.0,50.0,1,1,0,20.0,t,t,3,es,,0,f,0,,,,,
18686,20066,2022-09-12 22:29:06.094089,2022-09-12 22:29:06.094089,1,2007-11-07,177.0,55.0,2,0,0,20.0,f,t,1,es,,0,f,0,,,,,


### programs

It contains all the information about the programs

In [31]:

programs = pd.read_csv('data/programs.csv', on_bad_lines='skip', low_memory=False)
programs2 = programs.drop(['user_id', 'code_name', 'name_es', 
                           'description_es', 'auto_generated', 'priority_order', 
                           'next_program_id'], axis = 1)
programs2

Unnamed: 0,id,created_at,updated_at,pro,available,strength,endurance,technique,flexibility,intensity,name_en,description_en
0,326,2021-09-30 05:34:36.774594,2021-09-30 05:34:36.774594,t,f,0,0,0,0,0,Pro,Description en
1,343,2021-10-05 10:43:58.6371,2021-10-05 10:43:58.6371,t,f,0,0,0,0,0,Pro,Description en
2,22,2020-11-23 14:03:11.161414,2021-09-29 14:53:34.268573,f,t,5,4,3,3,4,Smash your goals,The ultimate program that prepares your body t...
3,92,2021-09-16 15:02:16.599309,2021-09-29 14:53:34.694305,t,f,0,0,0,0,0,Pro,Description en
4,10,2020-11-23 13:41:46.587265,2021-09-29 14:53:34.897432,f,t,1,3,1,2,2,Get motivated!,The ultimate beginner’s program designed to he...
...,...,...,...,...,...,...,...,...,...,...,...,...
478,497,2021-11-30 20:34:52.683516,2021-11-30 20:34:52.683516,t,f,0,0,0,0,0,Pro,Description en
479,498,2021-12-04 04:17:07.202851,2021-12-04 04:17:07.202851,t,f,0,0,0,0,0,Pro,Description en
480,504,2022-01-14 03:56:57.967608,2022-01-19 11:52:56.327901,f,f,0,0,0,0,0,legacy,
481,503,2022-01-07 11:52:02.285,2022-01-12 12:24:26.332726,t,t,3,3,2,4,2,Aurum,Aurum is a program designed to help sedentary ...


### program_sessions

Table of relation of many-to-many between programs and sessions.

In [32]:
program_sessions = pd.read_csv('data/program_sessions.csv', on_bad_lines='skip', low_memory=False)
program_sessions

Unnamed: 0,id,program_id,session_id,created_at,updated_at
0,662,49,586,2021-03-12 16:12:46.327341,2021-03-12 16:12:46.327341
1,855,24,778,2021-06-08 09:07:21.138981,2021-06-08 09:07:21.138981
2,856,24,779,2021-06-08 09:07:40.74051,2021-06-08 09:07:40.74051
3,857,24,780,2021-06-08 09:07:59.657941,2021-06-08 09:07:59.657941
4,279,5,288,2020-12-22 20:21:04.037297,2020-12-22 20:21:04.037297
...,...,...,...,...,...
1386,1783,503,1938,2022-01-07 19:14:08.725028,2022-01-07 19:14:08.725028
1387,1784,503,1939,2022-01-07 19:14:46.715396,2022-01-07 19:14:46.715396
1388,1785,503,1940,2022-01-07 19:15:23.37102,2022-01-07 19:15:23.37102
1389,1786,504,1941,2022-01-14 03:56:58.062909,2022-01-14 03:56:58.062909


### program_profiles

Table of relation of many-to-many between programs and profiles

In [46]:
program_profiles = pd.read_csv('data/program_profiles.csv', on_bad_lines='skip', low_memory=False)
program_profiles

Unnamed: 0,id,program_id,profile_id,created_at,updated_at
0,1,34,1,2020-11-23 16:10:14.679167,2020-11-23 16:10:14.679167
1,2,28,20,2020-11-27 13:41:47.642457,2020-11-27 13:41:47.642457
2,3,28,19,2020-11-27 13:41:47.648063,2020-11-27 13:41:47.648063
3,4,27,17,2020-11-27 13:43:07.059832,2020-11-27 13:43:07.059832
4,5,27,15,2020-11-27 13:43:07.065767,2020-11-27 13:43:07.065767
...,...,...,...,...,...
623,626,500,16,2022-01-14 14:14:49.554568,2022-01-14 14:14:49.554568
624,627,500,17,2022-01-14 14:14:49.5583,2022-01-14 14:14:49.5583
625,628,500,18,2022-01-14 14:14:49.56212,2022-01-14 14:14:49.56212
626,629,500,19,2022-01-14 14:14:49.565826,2022-01-14 14:14:49.565826


### profiles

I don't know what exactly this table represents

In [52]:
profiles = pd.read_csv('data/profiles.csv')
profiles2 = profiles.drop(['fat_level', 'name', 'created_at', 'updated_at'], axis = 1)
profiles2.head(10)

Unnamed: 0,id,gender,activity_level,goal,max_fat_level,min_fat_level
0,19,0,0,1,29.99,0
1,20,0,0,1,100.0,30
2,15,0,0,0,29.99,0
3,16,0,0,0,100.0,30
4,28,1,1,1,100.0,25
5,32,1,1,1,24.99,0
6,29,1,1,2,100.0,25
7,30,1,1,2,24.99,0
8,31,1,1,0,24.99,0
9,27,1,1,0,100.0,25


### program_characteristics

A program can have diferent characteristics (It does not make much sense)

In [55]:

program_characteristics = pd.read_csv('data/program_characteristics.csv', on_bad_lines='skip', low_memory=False)
program_characteristics2 = program_characteristics.drop(['created_at', 'updated_at', 
                                                         'objective', 'value_en',
                                                         'value_es'], axis = 1)
program_characteristics2 = program_characteristics
program_characteristics2.loc[program_characteristics['program_id'] == 30]

Unnamed: 0,id,program_id,created_at,updated_at,objective,value_en,value_es
0,164,30,2021-07-23 17:13:06.961104,2021-07-23 17:13:06.961104,f,A program created by Marcos Vázquez from Fitne...,Un programa creado por Marcos Vázquez de Fitne...
1,165,30,2021-07-23 17:13:06.96763,2021-07-23 17:13:06.96763,f,Combine strength and hypertrophy with progress...,Combina fuerza e hipertrofia con ejercicios pr...
22,166,30,2021-07-23 17:13:06.971463,2021-07-23 17:13:06.971463,f,"More than 100 exercises with rings, all adapta...","Más de 100 ejercicios con anillas, todos adapt..."
39,167,30,2021-07-23 17:13:06.975458,2021-07-23 17:13:06.975458,t,Increase in strength and muscles of the upper ...,Aumento de fuerza y musculatura del cuerpo sup...
40,168,30,2021-07-23 17:13:06.979908,2021-07-23 17:13:06.979908,t,"Mobility, stability and flexibility on your up...","Movilidad, estabilidad y flexibilidad de tu tr..."
41,169,30,2021-07-23 17:13:06.983818,2021-07-23 17:13:06.983818,t,Ease of progression: from beginner to advanced.,Facilidad de progresión: de novato a avanzado.
42,170,30,2021-07-23 17:13:06.987637,2021-07-23 17:13:06.987637,t,Fun and versatility.,Diversión y versatilidad.


### suscriptions

In [63]:
subscriptions = pd.read_csv('data/subscriptions.csv', on_bad_lines='skip', low_memory=False)
subscriptions2 = subscriptions.drop(['platform', 'transaction_body', 'start_date', 
                                     'end_date', 'subscription_type', 'cancelled_at',
                                     'cancelled', 'store_metadata','offer_code',
       'cancellation_reason', 'receipt_data'], axis = 1)
subscriptions2

Unnamed: 0,id,user_id,product_id,program_id,status,created_at,updated_at,affiliate_code
0,1353,1907,74,10,1.0,2021-10-25 11:03:03.245558,2021-10-26 11:50:42.563295,
1,1352,645,74,8,1.0,2021-10-25 11:03:02.992647,2021-10-26 11:51:50.627662,
2,1036,1604,74,6,1.0,2021-10-25 11:01:41.622483,2021-10-26 22:25:02.543061,
3,272,529,74,13,1.0,2021-04-22 10:22:02.773388,2021-11-14 10:53:21.222846,
4,1173,1737,74,6,1.0,2021-10-25 11:02:17.157806,2021-12-21 10:00:10.592175,
...,...,...,...,...,...,...,...,...
13153,13543,16493,74,25,0.0,2022-03-24 09:13:26.441364,2022-03-24 09:13:26.441364,
13154,13544,16494,74,10,0.0,2022-03-24 09:13:26.454334,2022-03-24 09:13:26.454334,
13155,13545,16495,74,9,0.0,2022-03-24 09:13:26.4674,2022-03-24 09:13:26.4674,
13156,13546,16497,74,14,0.0,2022-03-24 09:13:26.482328,2022-03-24 09:13:26.482328,



Relation many-to-many between users and programs respect to suscriptions. A user can be suscribed to diferent programs (see the example below)

In [59]:
subscriptions2['user_id'].value_counts()

18655    15
12710     7
13400     5
7960      5
8031      4
         ..
3689      1
3690      1
3691      1
3692      1
16498     1
Name: user_id, Length: 13023, dtype: int64

The rest of tables are ignored because Oriol said, that it was not completely implemented.

In [65]:
# Table of implements
implements = pd.read_csv('data/implements.csv', on_bad_lines='skip', low_memory=False)
implements2 = implements.drop(['created_at', 'updated_at', 'name_es'], axis = 1)

# Table of program implements
program_implements = pd.read_csv('data/program_implements.csv', on_bad_lines='skip', low_memory=False)
program_implements2 = program_implements.drop(['created_at', 'updated_at'], axis = 1)

# Table of user implements
user_implements = pd.read_csv('data/user_implements.csv', on_bad_lines='skip', low_memory=False)
user_implements2 = user_implements.drop(['created_at', 'updated_at'], axis = 1)

# Table of exercise implements 
exercise_implements = pd.read_csv('data/exercise_implements.csv', on_bad_lines='skip', low_memory=False)
exercise_implements2 = exercise_implements.drop(['created_at', 'updated_at'], axis = 1)