# Installing the Kaggle Package in Google Colab

The Kaggle API allows for seamless interaction with the Kaggle platform, enabling tasks such as downloading datasets and submitting solutions. To use the API, the `kaggle` package must first be installed in your Colab environment. You will also need to authenticate using your Kaggle API token.

Kaggle datasets come in various formats, typically structured for ease of use in data analysis and machine learning projects. Common formats include:

- **CSV (Comma-Separated Values):** Ideal for tabular data.
- **JSON (JavaScript Object Notation):** Suitable for nested or hierarchical data.
- **Image Files:** For computer vision tasks, datasets often include JPEG or PNG images.
- **Compressed Files:** Large datasets may be compressed in ZIP or 7z formats.

When working with these datasets in Colab, make sure you have the appropriate libraries (such as `pandas` for CSV or JSON files) to read and manipulate the data effectively.


In [1]:
# Install the Kaggle package to enable Kaggle API functionality
!pip install kaggle




In [9]:

!pip install autogluon

Collecting autogluon
  Downloading autogluon-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.core==1.1.1 (from autogluon.core[all]==1.1.1->autogluon)
  Downloading autogluon.core-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.features==1.1.1 (from autogluon)
  Downloading autogluon.features-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.tabular==1.1.1 (from autogluon.tabular[all]==1.1.1->autogluon)
  Downloading autogluon.tabular-1.1.1-py3-none-any.whl.metadata (13 kB)
Collecting autogluon.multimodal==1.1.1 (from autogluon)
  Downloading autogluon.multimodal-1.1.1-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.timeseries==1.1.1 (from autogluon.timeseries[all]==1.1.1->autogluon)
  Downloading autogluon.timeseries-1.1.1-py3-none-any.whl.metadata (12 kB)
Collecting scipy<1.13,>=1.5.4 (from autogluon.core==1.1.1->autogluon.core[all]==1.1.1->autogluon)
  Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metad

In [1]:
# Upload the kaggle.json file to Colab
from google.colab import files
files.upload()


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"subhashpolisetti347","key":"0590675aeb1ac3bbbb4f3b6a4ac2e351"}'}

In [2]:
# Create the Kaggle directory if it doesn't exist
!mkdir -p ~/.kaggle

# Move the kaggle.json file to this directory
!mv kaggle.json ~/.kaggle/

# Set the required permissions for the file
!chmod 600 ~/.kaggle/kaggle.json


In [3]:
# List your Kaggle competitions
!kaggle competitions list


ref                                                                                     deadline             category                reward  teamCount  userHasEntered  
--------------------------------------------------------------------------------------  -------------------  ---------------  -------------  ---------  --------------  
https://www.kaggle.com/competitions/arc-prize-2024                                      2024-11-10 23:59:00  Featured         1,100,000 Usd       1109           False  
https://www.kaggle.com/competitions/gemma-language-tuning                               2025-01-15 00:59:00  Analytics          150,000 Usd          0           False  
https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use       2024-12-19 23:59:00  Featured            60,000 Usd        915           False  
https://www.kaggle.com/competitions/eedi-mining-misconceptions-in-mathematics           2024-12-12 23:59:00  Featured            55,000 Usd        579     

In [16]:
!kaggle competitions download -c titanic



Downloading titanic.zip to /content
  0% 0.00/34.1k [00:00<?, ?B/s]
100% 34.1k/34.1k [00:00<00:00, 52.2MB/s]


In [17]:
# Unzip the Titanic dataset
!unzip titanic.zip


Archive:  titanic.zip
  inflating: gender_submission.csv   
  inflating: test.csv                
  inflating: train.csv               


In [18]:
import pandas as pd

# Load the train and test datasets
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')

# Display the first few rows of the training data
train_data.head()



Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [19]:
from autogluon.tabular import TabularPredictor

# Specify the label column ('Survived' is the classification target)
label_column = 'Survived'

# Train the AutoGluon predictor on the Titanic dataset
predictor = TabularPredictor(label=label_column).fit(train_data=train_data, time_limit=600)



No path specified. Models will be saved in: "AutogluonModels/ag-20241006_221134"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.1.1
Python Version:     3.10.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
Memory Avail:       10.72 GB / 12.67 GB (84.6%)
Disk Space Avail:   61.77 GB / 107.72 GB (57.3%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets.
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='best_quality'   : Maximize accuracy. Default time_limit=3600.
	presets='high_quality'   : Strong accuracy with fast inference speed. Default time_limit=3600.
	presets='good_quality'   : Good accuracy with very fast inference speed. Default time_limit=3600.
	presets='medium_quality' : Fast training time, ideal for initial prototyping.
Be

In [20]:
# Make predictions on the test data
predictions = predictor.predict(test_data)



In [21]:
# Prepare the submission DataFrame
submission = pd.DataFrame({'PassengerId': test_data['PassengerId'], 'Survived': predictions})

# Save the submission file to CSV
submission.to_csv('submission.csv', index=False)

# Display the first few rows of the submission DataFrame
submission.head()


Unnamed: 0,PassengerId,Survived
0,892,0
1,893,1
2,894,0
3,895,0
4,896,0


In [22]:
!kaggle competitions submit -c titanic -f submission.csv -m "AutoGluon submission"



100% 2.77k/2.77k [00:00<00:00, 3.08kB/s]
Successfully submitted to Titanic - Machine Learning from Disaster