# **Big Data with Google Colab using Python**
This notebook introduces **Big Data Analysis** in **Google Colab** using Python libraries such as **Dask, Pandas, and Google BigQuery**.

## **1. Setting up Google Colab for Big Data**
Google Colab provides a free cloud-based environment with powerful computing resources.

In [None]:
# Check system specifications
!cat /proc/cpuinfo | grep "model name" | uniq
!cat /proc/meminfo | grep "MemTotal"

## **2. Installing and Using Dask for Parallel Computing**
Dask is a powerful parallel computing library for handling big data.

In [None]:
!pip install dask
import dask.dataframe as dd

# Example: Load a large dataset
url = "https://people.sc.fsu.edu/~jburkardt/data/csv/hw_200.csv"
df = dd.read_csv(url)
df.head()

## **3. Processing Large Datasets with Pandas and Dask**
Let's analyze a large dataset using **Dask** to distribute computations efficiently.

In [None]:
# Compute basic statistics
df.describe().compute()

## **4. Data Visualization for Big Data**
Visualizing large datasets efficiently.

In [None]:
import matplotlib.pyplot as plt
df.compute().hist(figsize=(10, 5))
plt.show()

## **5. Using Google BigQuery for Large-Scale Data Analysis**
Google BigQuery allows querying massive datasets using SQL.

In [None]:
from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
client = bigquery.Client()

query = """
SELECT COUNT(*) AS total_trips, AVG(trip_distance) AS avg_distance
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2015`
WHERE pickup_datetime BETWEEN '2015-01-01' AND '2015-12-31'
"""

df_bigquery = client.query(query).to_dataframe()
df_bigquery

## **Conclusion**
This notebook demonstrated how to handle **big data in Google Colab** using:
- **Dask** for parallel computing
- **Pandas** for large dataset processing
- **Matplotlib** for visualization
- **Google BigQuery** for large-scale SQL analysis

🚀 **Now you can analyze large datasets efficiently in Colab!**