# More Methods to Load Data in Google Colab and Jupyter Notebook

# 1. Loading Data from a URL or API
If your data is hosted on a website or an API, you can use requests or urllib to fetch the data:

**Load CSV File from a URL:**

In [None]:
import pandas as pd
url = 'https://example.com/data.csv'
df = pd.read_csv(url)


**Load Data Using requests:**

In [None]:
import requests
response = requests.get('https://example.com/data')
data = response.json()  # If the data is in JSON format


# 2. Using gdown to Download Files from Google Drive
If your files are on Google Drive but you don’t want to mount the drive, you can use gdown to download them directly.

**Install gdown:**

In [None]:
!pip install gdown


**Download Files from Google Drive: Convert the shared link to an ID:**

In [None]:
!gdown 'https://drive.google.com/uc?id=your_file_id'


# 3. Loading Data from Databases
You can connect to various databases using libraries like sqlite3, psycopg2, or SQLAlchemy.

**Connecting to SQLite:**

In [None]:
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql_query("SELECT * FROM table_name", conn)


Connecting to PostgreSQL:

In [None]:
import psycopg2
conn = psycopg2.connect(
    host="localhost",
    database="your_db",
    user="your_user",
    password="your_password"
)
df = pd.read_sql_query("SELECT * FROM table_name", conn)


# 4. Using Google Sheets
To load data from Google Sheets, you can use the Google Sheets API or simpler libraries like gspread.

**Install and Use gspread: First, install the library:**

In [None]:
!pip install --upgrade gspread


**Then use the following code to load data:**

In [None]:
import gspread
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate
auth.authenticate_user()
gc = gspread.authorize(GoogleCredentials.get_application_default())

# Open Google Sheet
spreadsheet = gc.open('your_google_sheet_name')
worksheet = spreadsheet.sheet1

# Fetch data
rows = worksheet.get_all_values()


# 5. Using Kaggle to Download Datasets
If you are using datasets from Kaggle, you can download them directly in Google Colab.

**Install Kaggle Library:**

In [None]:
!pip install kaggle


**Upload Kaggle API Token and Download Dataset:**

In [None]:
from google.colab import files
files.upload()  # Upload kaggle.json

!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download Dataset
!kaggle datasets download -d dataset-owner/dataset-name


# 6. Using FTP to Fetch Data
If your data is hosted on an FTP server, you can use the ftplib library to connect and fetch the data:

**Connecting to an FTP Server:**

In [None]:
from ftplib import FTP

ftp = FTP('ftp.example.com')
ftp.login(user='username', passwd='password')

# Download file
with open('local_filename', 'wb') as f:
    ftp.retrbinary('RETR remote_filename', f.write)


# 7. Using Dropbox to Download Files
You can use the Dropbox API or libraries like dropbox to access files stored on Dropbox.

**Install Dropbox Library:**

In [None]:
!pip install dropbox


Download File from Dropbox:

In [None]:
import dropbox

dbx = dropbox.Dropbox('your_access_token')

with open("filename", "wb") as f:
    metadata, res = dbx.files_download('/dropbox_path/filename')
    f.write(res.content)


# Additional Methods for Uploading and Loading Data in Google Colab and Jupyter Notebook

# 1. Using Azure Blob Storage or AWS S3
If your data is stored in cloud storage services like Azure or AWS, you can use the respective libraries to fetch data:

**Azure Blob Storage:**

In [None]:
from azure.storage.blob import BlobServiceClient

connect_str = "your_connection_string"
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_client = blob_service_client.get_blob_client(container="your_container", blob="your_blob")

with open("filename", "wb") as download_file:
    download_file.write(blob_client.download_blob().readall())


# AWS S3:

In [None]:
import boto3

s3 = boto3.client('s3')
s3.download_file('your_bucket', 'your_key', 'filename')


# 2. Using BigQuery
**If your data is stored in Google BigQuery, you can use the bigquery library:**

In [None]:
from google.cloud import bigquery

client = bigquery.Client()
query = "SELECT * FROM `your_project.your_dataset.your_table`"
df = client.query(query).to_dataframe()


# 3. Using HDF5 and Parquet
If your data is stored in more advanced formats like HDF5 or Parquet, you can use related libraries:

**Load HDF5 File:**

In [None]:
import h5py

with h5py.File('filename.h5', 'r') as f:
    data = f['dataset_name'][:]


**Load Parquet File:**

In [None]:
import pandas as pd

df = pd.read_parquet('filename.parquet')


# 4. Using Google Cloud Storage (GCS)
**If you're using Google Cloud Storage (GCS), you can use the google-cloud-storage library:**

In [None]:
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('your_bucket_name')
blob = bucket.blob('your_blob_name')
blob.download_to_filename('filename')


# 5. Using Web Scraping
If your data comes from websites, you can use web scraping tools like BeautifulSoup or Selenium to fetch the data:

**Using BeautifulSoup:**

In [None]:
import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
data = soup.find_all('your_target_element')


# 6. Loading Excel Files
To load Excel data, you can use libraries like pandas or openpyxl:

**Load Excel File:**

In [None]:
import pandas as pd

df = pd.read_excel('filename.xlsx', sheet_name='Sheet1')


# 7. Using Google Cloud Datalab
Google Cloud Datalab is an advanced interactive tool designed for data analysis, machine learning, and visualization. It integrates with Google Cloud services such as BigQuery, Cloud Storage, and Machine Learning Engine, making it a powerful platform for handling large datasets and running analytics in the cloud.

**To use Google Cloud Datalab for loading and analyzing data, follow these steps:**

# Step 1: Set Up Datalab
You need to set up and deploy Datalab within your Google Cloud Project:

In [None]:
gcloud components install datalab
gcloud datalab create my-datalab-instance


# Step 2: Access Datalab
Once deployed, access Datalab in your browser, where it will open a Jupyter Notebook-like interface for running Python and SQL code.
# Step 3: Loading Data from BigQuery
In Datalab, you can run BigQuery queries directly:

In [None]:
%%bigquery
SELECT * FROM `your_project.your_dataset.your_table`


# Step 4: Loading Data from Google Cloud Storage
To load data from Google Cloud Storage (GCS):

In [None]:
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('your_bucket_name')
blob = bucket.blob('your_blob_name')
blob.download_to_filename('filename')


# Step 5: Machine Learning with TensorFlow
You can also run machine learning models using TensorFlow or other libraries integrated with Google Cloud’s Machine Learning Engine:

In [None]:
import tensorflow as tf

# Load data, train model, etc.


# 8. Using Google Sheets

Google Sheets is another powerful tool that allows you to store and work with data in a spreadsheet format. You can easily load and manipulate data from Google Sheets in Google Colab or Jupyter Notebook using the gspread library or `pandas com

Here’s how to load data from Google Sheet

**Step 1: Install Install gspread and Authenticate**
You need to install gspread and authen

In [None]:
!pip install gspread
!pip install gspread_dataframe
!pip install --upgrade gspread oauth2client


After that, you need to set up authentication using the Google Cloud console by creating a project and enabling the Google Sheets API. Once you've obtained the credentials, you can upload them to Colab.

Step 2: Authenticate and Connect to Google Sheets
Use the following code to authenticate:

In [None]:
import gspread
from oauth2client.service_account import ServiceAccountCredentials

# Define the scope
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',
         "https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]

# Add credentials to the account
creds = ServiceAccountCredentials.from_json_keyfile_name('your_credentials_file.json', scope)

# Authorize the client
client = gspread.authorize(creds)

# Access the Google Sheet
sheet = client.open('Your_Spreadsheet_Name').sheet1

# Get all the records from the sheet
data = sheet.get_all_records()

# Display the data
print(data)


# Step 3: Using Pandas to Read Google Sheets
You can also load Google Sheets data directly into a pandas DataFrame:

In [None]:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe
from oauth2client.service_account import ServiceAccountCredentials

# Authenticate and access the sheet
creds = ServiceAccountCredentials.from_json_keyfile_name('your_credentials_file.json', scope)
client = gspread.authorize(creds)

# Access the Google Sheet
sheet = client.open('Your_Spreadsheet_Name').sheet1

# Load the data into a pandas DataFrame
df = get_as_dataframe(sheet)
print(df.head())


# Step 4: Reading Data with Google Sheets API via pandas
Alternatively, you can use the pandas library to fetch data directly from Google Sheets without using gspread:

In [None]:
!pip install --upgrade gspread pandas

import pandas as pd

# Google Sheets URL
sheet_url = 'https://docs.google.com/spreadsheets/d/YOUR_SPREADSHEET_ID/edit#gid=0'

# Load the sheet into a pandas DataFrame
csv_export_url = sheet_url.replace('/edit#gid=', '/gviz/tq?tqx=out:csv&gid=')
df = pd.read_csv(csv_export_url)

# View the first few rows
print(df.head())
