# DVC Quickstart Assignment

Welcome! In this notebook, you'll get hands-on experience with DVC, a tool for versioning datasets and models.

**Objective:** Track a dataset and a model file using DVC in under 10 minutes.

## 1. Initialize DVC
Let's initialize DVC in your project folder. This will create the necessary configuration files.

In [1]:
# Initialize DVC (already installed, so just run in terminal if needed)
!dvc init

Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/treeverse/dvc>


## 2. Track a Dataset with DVC
Let's add a sample dataset to DVC tracking.

In [2]:
# Create a sample dataset file
import pandas as pd
df = pd.DataFrame({'feature1': [1, 2, 3], 'feature2': [4, 5, 6]})
df.to_csv('sample_dataset.csv', index=False)
print('sample_dataset.csv created')

sample_dataset.csv created


In [3]:
# Track the dataset file with DVC
!dvc add sample_dataset.csv


To track the changes with git, run:

	git add sample_dataset.csv.dvc .gitignore

To enable auto staging, run:

	dvc config core.autostage true


⠋ Checking graph



## 3. Track a Model File with DVC
Let's create a simple model file and add it to DVC tracking.

In [4]:
# Create a simple model file
with open('model.pkl', 'wb') as f:
    f.write(b'Model binary content')
print('model.pkl created')

model.pkl created


In [5]:
# Track the model file with DVC
!dvc add model.pkl


To track the changes with git, run:

	git add model.pkl.dvc .gitignore

To enable auto staging, run:

	dvc config core.autostage true


⠋ Checking graph



## 4. Check DVC Status
Let's check the status of tracked files.

In [6]:
# Check DVC status for tracked files
!dvc status

Data and pipelines are up to date.
