# Import required libraries


In [1]:
%pip install comet_ml --quiet
%pip install gdown --upgrade --quiet

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [None]:
import numpy as np
import pandas as pd
import os
import gdown
import comet_ml
from comet_ml import Artifact, Experiment

## Read in raw data as pandas DataFrame for inspection

In [8]:
import csv
import pandas
df = pandas.read_csv('paysim-data.csv')
raw_data = raw_data.to_parquet()


This data is pretty decently sized at 6.3M rows and about 500MB in size.

Before we log this as an artificat, let's put it into a more lightweight filetype - a parquet file.

Make sure you already have `pyarrow` installed before running the `to_parquet` method of the DataFrame.

In [10]:
raw_data.to_parquet('paysim-data.parquet.gzip', compression='gzip')

# Creating an Artifact 

Let's go ahead and create an Artifact object!

To create and track an Artifact on Comet, we will need to: 

1. Initialize Comet and set your API key and Project Name
2. Create an Experiment object to log the Artifact
3. Create an Artifact Object and provide some metadata 
4. Add the dataset to the Artifact object
5. Upload the data to Comet using `experiment.log_artifact`

Let's go ahead and walk through these steps.


**Important Note:** Artifact names and types are user defined strings that are used for organization in the UI. You can set these to be anything you want, but it is recommended that you give them meaningful names so that it is easy to reason about what they contain. 


## 1. Initialize Comet and set your API key and project name

In [6]:
PROJECT_NAME = 'fraud-detection-demo'
comet_ml.init(workspace='team-comet-ml', project_name=PROJECT_NAME)

Please enter your Comet API key from https://www.comet.com/api/my/settings/
(api key may not show as you type)


## 2. Create an Experiment object to log the Artifact

Comet Artifacts allow keeping track of assets beyond any particular experiment. 

You can keep track of Artifact versions, create many types of assets, manage them, and use them in any step in your ML pipelines---from training to production deployment.

In [None]:
experiment = Experiment(
    project_name=PROJECT_NAME
)
experiment.set_name('fetch-data')
experiment.add_tag('raw-data')

COMET INFO: Couldn't find a Git repository in '/content' and lookings in parents. You can override where Comet is looking for a Git Patch by setting the configuration `COMET_GIT_DIRECTORY`
COMET INFO: Experiment is live on comet.ml https://www.comet.ml/team-comet-ml/fraud-detection-demo/441c93f3ca504064ad7871e97e4e1aaf



## 3. Create an Artifact object and provide some metadata 

Let's track our dataset with an Artifact! 

In order to create an Artifact, we have to provide a name for it. We can also provide some additional information about the Artifact, such as a type string that identifies what kind of Artifact you are uploading (a model, dataset, etc).

We can add alias identifiers to the Artifact, such as "raw-data", "test-data" or "staging-model". 

These Artifacts can then be retrieved based on these aliases, we'll see just how to do in other notebooks in this series.

We will add a metadata dictionary containing any other additional information about your Artifact in this dictionary.


In [None]:
artifact = Artifact(name="paysim-data",
                    artifact_type="tabluar dataset",
                    aliases=["raw-data"],
                    metadata={'filetype':'parquet', 
                              'original_source':'Downloaded from stakeholders Google Drive,file id = 1DtPhOdYXNsjW2EjysLvblSCBkCLJ'}
)

## 4. Add the dataset to the Artifact object


In [None]:
artifact.add("paysim-data.parquet.gzip")

## 5. Upload the data to Comet using `experiment.log_artifact`

In [None]:
experiment.log_artifact(artifact)
experiment.end()

COMET INFO: Artifact 'paysim-data' version 1.0.0 created (previous was: 0.0.1)
COMET INFO: Scheduling the upload of 1 assets for a size of 171.93 MB, this can take some time
COMET INFO: Artifact 'team-comet-ml/paysim-data:1.0.0' has started uploading asynchronously
COMET INFO: ---------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://www.comet.ml/team-comet-ml/fraud-detection-demo/441c93f3ca504064ad7871e97e4e1aaf
COMET INFO:   Others:
COMET INFO:     Name : fetch-data
COMET INFO:   Uploads:
COMET INFO:     artifact assets     : 1 (171.93 MB)
COMET INFO:     artifacts           : 1
COMET INFO:     environment details : 1
COMET INFO:     filename            : 1
COMET INFO:     installed packages  : 1
COMET INFO:     notebook            : 1
COMET INFO:     os packages         : 1
COMET INFO:     source_code         : 1
COMET INFO: 