# Skript for TMS Data Cleaning

<strong><em>Important: This is a guide, which helps and explains you the data cleaning we where doing before this Hack-a-thon. There are parts you can and sometimes should directly copy and paste. You won't be able to copy the whole notebook and run it within your project.</em></strong>

## Creating the Client Connection to the Cloud Object Storage and the "smart-city-live-vehicle-positions" bucket

The following code cell can be automatically inserted trough the Notebook UI. To do so, click on the data button (top right corner) there you find the *files* and *connections* tab. Go to the *connection* as we want to create a client to our Cloud Object Storage. 

There you will find the Connection which we created before. Click "insert to code" and choose the "StreamingBody object" option. After that there will open a pop up which showes you the folder structure of your underlying cloud bucket. Choose the right folders and subfolders until you end up in the last subfolder, that contains all the .json files we need. Choose one file and click *Select*. Next you will see a code cell, inserted automatically, that looks like this one except it contains the correct api-keys etc.

> It doesn't matter which .json you will choose, because we will later on only use the created client object to access more then only one .json file.

In [None]:
# @hidden_cell


# This connection object is used to access your data and contains your credentials or project token.
# You might want to remove those credentials before you share your notebook.

import os
import types
import pandas as pd
import ibm_boto3
from botocore.client import Config
import logging

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.

connection_TMS_trafficData_client = ibm_boto3.client(
    service_name='s3',
    ibm_api_key_id='api-key',
    ibm_service_instance_id='service-instance-id',
    ibm_auth_endpoint='https://iam.cloud.ibm.com/identity/token',
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-de.cloud-object-storage.appdomain.cloud'
)

body = connection_TMS_trafficData_client.get_object(Bucket='smart-city-tms', Key='topics/digitraffic_tms/partition=0/digitraffic_tms+0+0011067000.json')['Body'].read()
# add missing __iter__ method, so pandas accepts body as file-like object 

if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# Since JSON data can be semi-structured and contain additional metadata, it is possible that you might face an error during data loading.
# Refer to the documentation of 'pandas.read_json()' and 'pandas.io.json.json_normalize' for more possibilities to adjust the data loading.
# pandas documentation: http://pandas.pydata.org/pandas-docs/stable/io.html#io-json-reader
# and http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html

pd.read_json(body, lines=True)

After we sucsessfully created the client (your's might be named differently than `Cloud_Object_Storage_Connection_client`, either change it in the cell above or keep in mind to change the name whenever the client is used) we now need a function that can read/get/access more than one .json file. 

As we know, the data is saved in a S3 object style. The following cell shows the function we use to access files over a given timespan in which they were written to the storage. (Don't stress about the function and how it works in detail 😉) Just copy and paste it.


In [2]:
import argparse
import boto3
import dateutil.parser
import logging
import pytz
from collections import namedtuple

import pandas as pd
from datetime import datetime, timezone, timedelta



logger = logging.getLogger(__name__)


Rule = namedtuple('Rule', ['has_min', 'has_max'])
last_modified_rules = {
    Rule(has_min=True, has_max=True):
        lambda min_date, date, max_date: min_date <= date <= max_date,
    Rule(has_min=True, has_max=False):
        lambda min_date, date, max_date: min_date <= date,
    Rule(has_min=False, has_max=True):
        lambda min_date, date, max_date: date <= max_date,
    Rule(has_min=False, has_max=False):
        lambda min_date, date, max_date: True,
}

def get_s3_objects(s3, bucket, prefixes=None, suffixes=None, last_modified_min=None, last_modified_max=None):
    
    if last_modified_min and last_modified_max and last_modified_max < last_modified_min:
        raise ValueError(
            "When using both, last_modified_max: {} must be greater than last_modified_min: {}".format(
                last_modified_max, last_modified_min
            )
        )
    # Use the last_modified_rules dict to lookup which conditional logic to apply
    # based on which arguments were supplied
    last_modified_rule = last_modified_rules[bool(last_modified_min), bool(last_modified_max)]

    if not prefixes:
        prefixes = ('',)
    else:
        prefixes = tuple(set(prefixes))
    if not suffixes:
        suffixes = ('',)
    else:
        suffixes = tuple(set(suffixes))

    kwargs = {'Bucket': bucket}

    for prefix in prefixes:
        kwargs['Prefix'] = prefix
        while True:
            # The S3 API response is a large blob of metadata.
            # 'Contents' contains information about the listed objects.
            resp = s3.list_objects_v2(**kwargs)
            for content in resp.get('Contents', []):
                last_modified_date = content['LastModified']
                if (
                    content['Key'].endswith(suffixes) and
                    last_modified_rule(last_modified_min, last_modified_date, last_modified_max)
                ):
                    yield content

            # The S3 API is paginated, returning up to 1000 keys at a time.
            # Pass the continuation token into the next response, until we
            # reach the final page (when this field is missing).
            try:
                kwargs['ContinuationToken'] = resp['NextContinuationToken']
            except KeyError:
                break

## Defining the Timespan, which jsons/S3 objects will be collected

As introduced right before this, we are able to access data from TMS trough our client and define a timespan in which we want to get the data.
Further we need Metadata, which might not be written within the choosen timewindow. So leave the metastarttime as it is.

We decided to create three variables:
`dateloading`,
`starttime`,
`endtime`, to create the timespan we were talking about. 

> Even for a few hours the data that has been collected can sum up to 1.000.000+ rows. So to get a feeling of the data cleaning process it is more than enough to create a small timespan (one hour). Also do remember that it is possible, if you change date and time, that there is no data available.

The use of metastarttime will explane itself later on.

In [3]:
dateloading = "2022-02-25"
#dateloading = str(os.environ['DATE'])
starttime = datetime.fromisoformat(dateloading + ' 10:00:00.000+00:00')
endtime = datetime.fromisoformat(dateloading + ' 12:00:00.000+00:00')
metastarttime = datetime.fromisoformat("2022-02-20" + ' 00:00:00.000+00:00')


Trough the `Cloud_Object_Storage_Connection` and with the usage of the defined method `get_s3_objects` we are now able to access our s3 objects, that were written within the defined timewindow.

In [4]:
objs=get_s3_objects(s3=connection_TMS_trafficData_client,bucket="smart-city-tms", last_modified_min=starttime, last_modified_max=endtime)

The variable `objs` is now filled with these s3 Objects, which isn't a format we can really work with in terms of the final data in form of .json. So it requires one more step to extract the wanted data into our pandas.DataFrame.

## Reading the Vehicle Positions from the variable objs and store them into a DataFrame

To receive our data, we use the variable `objs` and iterate trough every `obj` that it contains. We have defined an empty DataFrame (`df_tms`) which will be filled step by step with the data we want to extract. To do so, we have to use our client again. We use `.get_object()`, give it the exact Bucket we want to access and the Key to our data, which is stored in each `obj['Key']['Body']` and read the lines we "find" there which are in the .json format. Around that call, we use `pd.read_json()` to extract the data from the json. This input (now in the form of a DataFrame) now will be appended to the `df_tms`. After extracting all the jsons from all the given `obj` out of `objs`, we finally reset the index, drop the old one.

We then make a first quick step of Data Cleaning by throwing out the rows, which don't contain a roadStationId since we can't really work with these values when the identification is missing.

In [5]:
df_tms = pd.DataFrame()
for obj in objs:
    df_tms = df_tms.append(pd.read_json(connection_TMS_trafficData_client.get_object(Bucket='smart-city-tms', Key=obj['Key'])['Body'].read(), lines=True))
    
df_tms = df_tms.reset_index()
df_tms = df_tms[df_tms['roadStationId'].notna()]

Now, let's take a look if that worked out

In [None]:
df_tms

How many rows x columns do we have at hand? 

In [6]:
df_tms.shape

(780280, 15)

## Converting the tst into a other timestamp format 

We want to do this, because
1. SPSS Modeller don't understand the given format in `tst`
2. We maybe want to enrich the data by JOIN it with another resource.
   
So we define a method, which takes every `tst` value and save a changed version of that into a list, which becomes a new column after that.


In [8]:
import datetime

date = []

for x in df_tms.measuredTime:
    x = x[:-7]
    date_obj = datetime.datetime.strptime(x, '%Y-%m-%dT%H')
    date.append(str(date_obj.date()) + " " + str(date_obj.time()))


df_tms["timestamp"] = date

In [9]:
df_tms.head()

Unnamed: 0,index,measuredTime,roadStationId,oldName,name,sensorUnit,id,shortName,sensorValue,timeWindowStart,timeWindowEnd,lastUpdated,lastError,type,status,timestamp
0,0,2022-03-12T23:59:35Z,23575.0,keskinopeus_5min_liukuva_suunta1_VVAPAAS1,KESKINOPEUS_5MIN_LIUKUVA_SUUNTA1_VVAPAAS1,***,5158.0,LTila1,139.0,,,,,,,2022-03-12 23:00:00
1,1,2022-03-12T23:59:35Z,23575.0,ohitukset_5min_liukuva_suunta1_MS1,OHITUKSET_5MIN_LIUKUVA_SUUNTA1_MS1,***,5164.0,MTila1,1.0,,,,,,,2022-03-12 23:00:00
2,2,2022-03-12T23:59:35Z,23575.0,ohitukset_60min_kiintea_suunta2_MS2,OHITUKSET_60MIN_KIINTEA_SUUNTA2_MS2,***,5071.0,MTil2,1.0,2022-03-12T22:00:00Z,2022-03-12T23:00:00Z,,,,,2022-03-12 23:00:00
3,3,2022-03-12T23:59:35Z,23575.0,keskinopeus_60min_kiintea_suunta2,KESKINOPEUS_60MIN_KIINTEA_SUUNTA2,km/h,5057.0,km/h2,103.0,2022-03-12T22:00:00Z,2022-03-12T23:00:00Z,,,,,2022-03-12 23:00:00
4,4,2022-03-12T23:59:35Z,23575.0,ohitukset_5min_liukuva_suunta2_MS2,OHITUKSET_5MIN_LIUKUVA_SUUNTA2_MS2,***,5168.0,MTila2,1.0,,,,,,,2022-03-12 23:00:00


## First "pre" cleaning step; drop duplicates `roadStationId`, `id` and `timestamp` 

We had to do this action at this point of the process, because we figured out over the time, that there are many duplicates in this data. Which significantly slowed down the following calculations. 

So, by using these three attributes we kind of build a __"primary key"__ to make sure, only duplicates are dropped.

In [10]:
df_tms = df_tms.drop_duplicates(subset=['roadStationId', 'id', 'timestamp'])

Just compare the shape from before (~row 7) and now. We were able to clean a lot of duplicates from the DataFrame.

In [11]:
df_tms.shape

(18924, 16)

In [12]:
df_tms.head()

Unnamed: 0,index,measuredTime,roadStationId,oldName,name,sensorUnit,id,shortName,sensorValue,timeWindowStart,timeWindowEnd,lastUpdated,lastError,type,status,timestamp
0,0,2022-03-12T23:59:35Z,23575.0,keskinopeus_5min_liukuva_suunta1_VVAPAAS1,KESKINOPEUS_5MIN_LIUKUVA_SUUNTA1_VVAPAAS1,***,5158.0,LTila1,139.0,,,,,,,2022-03-12 23:00:00
1,1,2022-03-12T23:59:35Z,23575.0,ohitukset_5min_liukuva_suunta1_MS1,OHITUKSET_5MIN_LIUKUVA_SUUNTA1_MS1,***,5164.0,MTila1,1.0,,,,,,,2022-03-12 23:00:00
2,2,2022-03-12T23:59:35Z,23575.0,ohitukset_60min_kiintea_suunta2_MS2,OHITUKSET_60MIN_KIINTEA_SUUNTA2_MS2,***,5071.0,MTil2,1.0,2022-03-12T22:00:00Z,2022-03-12T23:00:00Z,,,,,2022-03-12 23:00:00
3,3,2022-03-12T23:59:35Z,23575.0,keskinopeus_60min_kiintea_suunta2,KESKINOPEUS_60MIN_KIINTEA_SUUNTA2,km/h,5057.0,km/h2,103.0,2022-03-12T22:00:00Z,2022-03-12T23:00:00Z,,,,,2022-03-12 23:00:00
4,4,2022-03-12T23:59:35Z,23575.0,ohitukset_5min_liukuva_suunta2_MS2,OHITUKSET_5MIN_LIUKUVA_SUUNTA2_MS2,***,5168.0,MTila2,1.0,,,,,,,2022-03-12 23:00:00


# Loading metadata

By now TMS comes with the value `roadStationId`, which don't provides us directly with a given position in Finland. To overcome that missing information, the TMS-API also provides us with metadata information about the `roadStationId`s. 
These metadata informations are also written into the same object storage, within a different bucket (`smart-city-tms-stations-metadata`). The value of `metastarttime` has been set beforehand and is choosen to be larger than the starttime, because it can occure that TMS data about certain `roadStationId`s is available, but there is no metadata available within the same timewindow. So we choose a bigger one, to cover all the possible `roadStaionId`s. 

In [13]:
#metadaten der TMS Stations
objs_tms_meta=get_s3_objects(s3=connection_TMS_trafficData_client,bucket="smart-city-tms-stations-metadata", last_modified_min=metastarttime, last_modified_max=endtime)
df_tms_meta = pd.DataFrame()

for obj in objs_tms_meta:
    if df_tms_meta.empty:
        df_tms_meta = pd.read_json(connection_TMS_trafficData_client.get_object(Bucket='smart-city-tms-stations-metadata', Key=obj['Key'])['Body'].read(), lines=True)
    else:
        df_tms_meta_tmp = pd.read_json(connection_TMS_trafficData_client.get_object(Bucket='smart-city-tms-stations-metadata', Key=obj['Key'])['Body'].read(), lines=True)
        df_tms_meta = df_tms_meta.append(df_tms_meta_tmp)
df_tms_meta = df_tms_meta.reset_index()

## Transform metadata

In the `df_tms_meta` there are many informations which we don't care about, we only want to access the geometry data. after that, we collect the columns `['longitude', 'latitude']` to start building our DataFrame (`df`). Then we take another (support) DataFrame, which only contains the various id's. We reset their indexes, JOIN them and drop all the duplicated `id`s aka `roadStationId`s and empty rows. 

In [None]:
#Datentransformation
df = pd.json_normalize(df_tms_meta['geometry'])
print(df)
df = pd.DataFrame(df.iloc[:,0].tolist(), columns=['longitude', 'latitude', 'else'])
df = df[['latitude', 'longitude']]
df2 = df_tms_meta['id']
df2 = df2.reset_index()
df = df.reset_index()
df = df.join(df2.set_index('index'), on='index', how='inner')
df = df[['latitude', 'longitude', 'id']]
df = df.rename(columns={'id': 'roadStationId'})
df = df.drop_duplicates(subset=['roadStationId'])
df.dropna()

# Join metadata with tms data

Now we have two dataframes: `tms` which does contain all the sensor values and `metadata` that contains the geolocation of every `roadStationId`. 

So the next steps brings these two DataFrames together (pd.merge()). 

In [None]:
df_tms = pd.merge(df_tms, df, left_on='roadStationId', right_on='roadStationId', how='left')

We make sure, that no row went missing and we appended our dataframe with 2 columns `[latitude, longitude]`

In [16]:
df_tms.shape

(18924, 18)

In [17]:
df_tms.head()

Unnamed: 0,index,measuredTime,roadStationId,oldName,name,sensorUnit,id,shortName,sensorValue,timeWindowStart,timeWindowEnd,lastUpdated,lastError,type,status,timestamp,latitude,longitude
0,0,2022-03-12T23:59:35Z,23575.0,keskinopeus_5min_liukuva_suunta1_VVAPAAS1,KESKINOPEUS_5MIN_LIUKUVA_SUUNTA1_VVAPAAS1,***,5158.0,LTila1,139.0,,,,,,,2022-03-12 23:00:00,60.486397,26.54646
1,1,2022-03-12T23:59:35Z,23575.0,ohitukset_5min_liukuva_suunta1_MS1,OHITUKSET_5MIN_LIUKUVA_SUUNTA1_MS1,***,5164.0,MTila1,1.0,,,,,,,2022-03-12 23:00:00,60.486397,26.54646
2,2,2022-03-12T23:59:35Z,23575.0,ohitukset_60min_kiintea_suunta2_MS2,OHITUKSET_60MIN_KIINTEA_SUUNTA2_MS2,***,5071.0,MTil2,1.0,2022-03-12T22:00:00Z,2022-03-12T23:00:00Z,,,,,2022-03-12 23:00:00,60.486397,26.54646
3,3,2022-03-12T23:59:35Z,23575.0,keskinopeus_60min_kiintea_suunta2,KESKINOPEUS_60MIN_KIINTEA_SUUNTA2,km/h,5057.0,km/h2,103.0,2022-03-12T22:00:00Z,2022-03-12T23:00:00Z,,,,,2022-03-12 23:00:00,60.486397,26.54646
4,4,2022-03-12T23:59:35Z,23575.0,ohitukset_5min_liukuva_suunta2_MS2,OHITUKSET_5MIN_LIUKUVA_SUUNTA2_MS2,***,5168.0,MTila2,1.0,,,,,,,2022-03-12 23:00:00,60.486397,26.54646


# TMS Data Cleaning (the real part 😉)

We kindof started the cleaning process a bit early with deleting the duplicated rows as described above. Now we really take a deep look into the documentation and the features we have at hand.

## Delete all the irrelevant features 

After looking at the documentation __(https://www.digitraffic.fi/en/road-traffic/lam/)__ and deciding for our self, which features are irrelevant because the way they were measured or what they are about, we decided to drop the following ones:

- oldName
- sensorUnit
- shortName
- lastUpdated
- lastError
- type
- status
- timeWindowStart
- timeWindowEnd

> ____‼️____ Note that you don't have to decide the same way we did. Think about what important features you want to keep.

In [18]:
df_tms_clean = df_tms.drop(['name', 'timeWindowStart', 'timeWindowEnd', 'oldName', 'sensorUnit', 'shortName', 'lastUpdated', 'lastError', 'type', 'status', 'index'], axis=1)

In [19]:
df_tms_clean.head()

Unnamed: 0,measuredTime,roadStationId,id,sensorValue,timestamp,latitude,longitude
0,2022-03-12T23:59:35Z,23575.0,5158.0,139.0,2022-03-12 23:00:00,60.486397,26.54646
1,2022-03-12T23:59:35Z,23575.0,5164.0,1.0,2022-03-12 23:00:00,60.486397,26.54646
2,2022-03-12T23:59:35Z,23575.0,5071.0,1.0,2022-03-12 23:00:00,60.486397,26.54646
3,2022-03-12T23:59:35Z,23575.0,5057.0,103.0,2022-03-12 23:00:00,60.486397,26.54646
4,2022-03-12T23:59:35Z,23575.0,5168.0,1.0,2022-03-12 23:00:00,60.486397,26.54646


# Subsample after the hourly data
(https://www.digitraffic.fi/en/road-traffic/lam/)

If you took a look into the documenation of the API, you found that there are IDs for five minute and sixty minute measurements. We decide wo go with the higher aggregation of the data, which contains enough information for now.

All the IDs that contain the data for the sixty minute measurements: 
- 5056
- 5057
- 5054
- 5055
- 5067
- 5071

So we subset our `df_tms_clean` into a new DataFrame: `df_tms_hour`, now containing only the hour data.

In [20]:
df_tms_hour = df_tms_clean[df_tms_clean.id.isin([5056, 5057, 5054, 5055, 5067, 5071])]

In [21]:
df_tms_hour.head()

Unnamed: 0,measuredTime,roadStationId,id,sensorValue,timestamp,latitude,longitude
2,2022-03-12T23:59:35Z,23575.0,5071.0,1.0,2022-03-12 23:00:00,60.486397,26.54646
3,2022-03-12T23:59:35Z,23575.0,5057.0,103.0,2022-03-12 23:00:00,60.486397,26.54646
7,2022-03-12T23:59:35Z,23575.0,5067.0,2.0,2022-03-12 23:00:00,60.486397,26.54646
13,2022-03-12T23:59:45Z,23576.0,5054.0,69.0,2022-03-12 23:00:00,60.514416,26.926898
17,2022-03-12T23:59:45Z,23576.0,5071.0,3.0,2022-03-12 23:00:00,60.514416,26.926898


In [22]:
df_tms_hour = df_tms_hour.reset_index(drop=True)

In [23]:
df_tms_hour.head()

Unnamed: 0,measuredTime,roadStationId,id,sensorValue,timestamp,latitude,longitude
0,2022-03-12T23:59:35Z,23575.0,5071.0,1.0,2022-03-12 23:00:00,60.486397,26.54646
1,2022-03-12T23:59:35Z,23575.0,5057.0,103.0,2022-03-12 23:00:00,60.486397,26.54646
2,2022-03-12T23:59:35Z,23575.0,5067.0,2.0,2022-03-12 23:00:00,60.486397,26.54646
3,2022-03-12T23:59:45Z,23576.0,5054.0,69.0,2022-03-12 23:00:00,60.514416,26.926898
4,2022-03-12T23:59:45Z,23576.0,5071.0,3.0,2022-03-12 23:00:00,60.514416,26.926898


# Calculate geographical distance to our choosen central geolocation

You may have found that the TMS data is spread all accross the country of Finland. With the knowledge, that there are other data sources at hand, which focus themself around Helsinki, we decided to subsample TMS once again. We only wanted to have TMS data left, that is ~20km around the center of Helsinki.

To get a distance measurement from (longitude, latitude) locations, we needed a library called `geopy`. That's the reason, why we started with defining a custom environment. If we would have installed the package right now (which is possible) we would have to restart the kernel, which results in a unnessesary re-calculation of the done work.

So we went on GoogleMaps and choose a fixed Lat,Long pair as our central point of measurement. We then calculated every distance from the `roadStationId`s to this point and appended the `df_tms_hour`by one column, that contains the calculated distance.

In [24]:
import geopy
from geopy import distance
#(Lat, Long)!!
#fix ~ middle of Helsinki

fix = (60.192059, 24.945831)
abstand = []
for x in range(len(df_tms_hour)):
    distance = geopy.distance.distance((df_tms_hour.latitude[x], df_tms_hour.longitude[x]), fix).km
    abstand.append(distance)

df_tms_hour['distance'] = abstand
df_tms_hour

Unnamed: 0,measuredTime,roadStationId,id,sensorValue,timestamp,latitude,longitude,distance
0,2022-03-12T23:59:35Z,23575.0,5071.0,1.0,2022-03-12 23:00:00,60.486397,26.546460,94.283255
1,2022-03-12T23:59:35Z,23575.0,5057.0,103.0,2022-03-12 23:00:00,60.486397,26.546460,94.283255
2,2022-03-12T23:59:35Z,23575.0,5067.0,2.0,2022-03-12 23:00:00,60.486397,26.546460,94.283255
3,2022-03-12T23:59:45Z,23576.0,5054.0,69.0,2022-03-12 23:00:00,60.514416,26.926898,115.104429
4,2022-03-12T23:59:45Z,23576.0,5071.0,3.0,2022-03-12 23:00:00,60.514416,26.926898,115.104429
...,...,...,...,...,...,...,...,...
6331,2022-03-13T01:32:05Z,23142.0,5067.0,1.0,2022-03-13 01:00:00,60.736118,25.448445,66.627738
6332,2022-03-13T01:32:05Z,23142.0,5056.0,107.0,2022-03-13 01:00:00,60.736118,25.448445,66.627738
6333,2022-03-13T01:32:05Z,23142.0,5057.0,107.0,2022-03-13 01:00:00,60.736118,25.448445,66.627738
6334,2022-03-13T01:32:05Z,23142.0,5071.0,1.0,2022-03-13 01:00:00,60.736118,25.448445,66.627738


We than decided to limit the distance, a `roadStationId` has to the central, to 20km and delete this column, since we don't have any further use for it.

In [25]:
df_tms_core = df_tms_hour[df_tms_hour.distance <= 20]

In [26]:
df_tms_core = df_tms_core.reset_index(drop=True)

In [27]:
df_tms_core = df_tms_core.drop(['distance'], axis=1)


# Save the data back to our project

If we think we are finished with cleaning our data, which doesn't have to mean you have the exact same result, we want to extract the data out of the notebook and back into our project space. There we can use it as a data asset. 

To do so, we use the python libary `project_lib` and import `Project` from it. This gives us the needed functionality, to save the data (dataFrame is converted trough a pandas.DataFrame method named `to_csv()` into a csv format) back to our project space where it can be found as an data asset. 

In [None]:
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='project-id', project_access_token='project-access-token')
pc = project.project_context


project.save_data(data=df_tms_core.to_csv(index=False),file_name=str(dateloading)+"-only_TMS_hour.csv",overwrite=True)