
# **What is Pandas jan?**


"Pandas" is a popular open-source Python library used for data manipulation and analysis. It provides easy-to-use data structures and data analysis tools that make working with structured data, such as tabular or time-series data, more intuitive and efficient.

In [None]:
import numpy as np
import pandas as pd

# **To load data directly from Colab**

In [None]:
train= pd.read_csv('/content/train.csv')

train.head()

# **To load data directly from Kaggle into Google Colab, you can follow these steps**


# **A)**

### **Obtaining Kaggle API Credentials:**

1. **Create a Kaggle Account**: If you don't have one already, sign up for a Kaggle account at [kaggle.com](https://www.kaggle.com/account/login).

2. **Enable Kaggle API Access**: Go to your Kaggle account settings page. Scroll down to the "API" section and click on "Create New API Token". This will download a file named `kaggle.json` to your local machine. This file contains your Kaggle API credentials.

# **B)**
### **Loading Kaggle API Credentials into Colab**:

1. **Open Google Colab**: Go to [colab.research.google.com](https://colab.research.google.com/).

2. **Create a New Notebook**: If you don't already have one, create a new notebook by clicking on "File" -> "New Notebook" -> "Python 3".

3. **Upload Kaggle API Credentials**:
   - Run the following code in a code cell:

     ```python
     from google.colab import files
     files.upload()
     ```

   - This will prompt you to upload files. Click on "Choose Files" and select the `kaggle.json` file you downloaded earlier.

4. **Verify Upload**: Once the file is uploaded, you can verify that it's been successfully uploaded by checking the content section in Google Colab.


In [None]:
from google.colab import files
files.upload()

# **C)**

This script facilitates the setup of the Kaggle CLI for Python environments. Let's break it down:

1. `!pip install kaggle`: This line utilizes pip to install the Kaggle Python package, enabling interaction with the Kaggle platform from within Python.

2. `!mkdir ~/.kaggle`: Here, a directory named ".kaggle" is created in the user's home directory. Kaggle utilizes this directory to store its configuration files.

3. `!cp kaggle.json ~/.kaggle/`: This command copies a file named "kaggle.json" to the newly created ".kaggle" directory. The "kaggle.json" file houses the user's Kaggle API credentials, vital for authentication when utilizing the Kaggle CLI or API.

4. `!chmod 600 ~/.kaggle/kaggle.json`: This line adjusts the permissions of the "kaggle.json" file, ensuring that only the owner has read and write access. This security measure safeguards the Kaggle API credentials from unauthorized access.

In summary, this script facilitates the installation of the Kaggle CLI, establishes the requisite directory structure, transfers the Kaggle API credentials to the designated location, and secures the credentials file permissions.

In [None]:
!pip install kaggle
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json



# **D)**

To download a **dataset** from Kaggle using the Kaggle API, you can follow this general format:

```python
!kaggle datasets download -d dataset-slug
```

Where:
- `!kaggle`: Executes the Kaggle CLI.
- `datasets download`: Specifies that you want to download a dataset.
- `-d dataset-slug`: Specifies the dataset you want to download using its slug (a unique identifier for the dataset).

If you're downloading a dataset from a **competition**, the format is similar:

```python
!kaggle competitions download -c competition-slug
```

Where:
- `competitions download`: Specifies that you want to download a dataset from a competition.
- `-c competition-slug`: Specifies the competition you want to download from using its slug.

Remember to replace `dataset-slug` with the actual slug of the dataset and `competition-slug` with the actual slug of the competition. Additionally, ensure you have set up your Kaggle API credentials and have installed the Kaggle CLI in your environment before executing these commands.

To download the dataset from the WiDS Datathon 2024 Challenge 2 competition, you can visit the competition page located at https://www.kaggle.com/competitions/widsdatathon2024-challenge2. The competition slug for this competition is "widsdatathon2024-challenge2".

In [None]:
!kaggle competitions download -c widsdatathon2024-challenge2

Downloading widsdatathon2024-challenge2.zip to /content
  0% 0.00/5.71M [00:00<?, ?B/s]
100% 5.71M/5.71M [00:00<00:00, 72.4MB/s]


To download the dataset from the provided link, you can visit the Kaggle dataset page located at https://www.kaggle.com/datasets/mohithsairamreddy/salary-data. The dataset slug is "mohithsairamreddy/salary-data".

In [None]:
!kaggle datasets download -d mohithsairamreddy/salary-data


Downloading salary-data.zip to /content
  0% 0.00/16.6k [00:00<?, ?B/s]
100% 16.6k/16.6k [00:00<00:00, 31.5MB/s]


# **E)**

Once you've downloaded the dataset, which typically arrives in a zip format, you'll need to unzip it using the following command.

In [None]:
!unzip widsdatathon2024-challenge2.zip
!ls

Archive:  widsdatathon2024-challenge2.zip
  inflating: solution_template.csv   
  inflating: test.csv                
  inflating: train.csv               
kaggle.json	 sample_data		test.csv   widsdatathon2024-challenge2.zip
salary-data.zip  solution_template.csv	train.csv


# **F)**

Then load the files into Google Colab enviromen for further Analysis.

In [None]:
train= pd.read_csv('/content/train.csv')

train.head()

Unnamed: 0,patient_id,patient_race,payer_type,patient_state,patient_zip3,Region,Division,patient_age,patient_gender,bmi,...,Average of Apr-18,Average of May-18,Average of Jun-18,Average of Jul-18,Average of Aug-18,Average of Sep-18,Average of Oct-18,Average of Nov-18,Average of Dec-18,metastatic_diagnosis_period
0,268700,,COMMERCIAL,AR,724,South,West South Central,39,F,,...,52.55,74.77,79.96,81.69,78.3,74.56,59.98,42.98,41.18,191
1,484983,White,,IL,629,Midwest,East North Central,55,F,35.36,...,49.3,72.87,77.4,77.43,75.83,72.64,58.36,39.68,39.71,33
2,277055,,COMMERCIAL,CA,925,West,Pacific,59,F,,...,68.5,70.31,78.61,87.24,85.52,80.75,70.81,62.67,55.58,157
3,320055,Hispanic,MEDICAID,CA,900,West,Pacific,59,F,,...,63.34,63.1,67.45,75.86,75.24,71.1,68.95,65.46,59.46,146
4,190386,,COMMERCIAL,CA,934,West,Pacific,71,F,,...,59.45,60.24,64.77,69.81,70.13,68.1,65.38,60.72,54.08,286
