# Integration of Pandas with BigQuery

This notebook provides an overview and examples of how to integrate Pandas with BigQuery for efficient data analysis.

## Overview

Pandas is a powerful Python library for data manipulation and analysis, offering data structures and operations for manipulating numerical tables and time series. BigQuery is a highly scalable and serverless data warehouse provided by Google Cloud that enables super-fast SQL queries using the processing power of Google's infrastructure.

_By integrating Pandas with BigQuery, you can leverage the best of both worlds: BigQuery's ability to quickly process very large datasets and Pandas' rich set of data manipulation tools for detailed analysis._

In [None]:
from google.cloud import bigquery
from time import time

In [None]:
client = bigquery.Client()
job_config = bigquery.QueryJobConfig(use_query_cache=False)

In [None]:
#Create a query string
query = """
SELECT
  order_status,
  COUNT(*) AS order_count
FROM
  pp-bigquery-02.demo_retail.orders
GROUP BY
  order_status;
"""

In [None]:
print(type(query))

Make an API request

In [None]:
query_job = client.query(query, job_config=job_config)

#### Return the query results into a Pandas DataFrame

In [None]:
df = query_job.result().to_dataframe()

#### Displaying the Top Rows of a DataFrame

The `df.head()` method is used to display the first few rows of a DataFrame `df`. By default, this method returns the first five rows


In [None]:
df.head()

#### Displaying Summary Information of a DataFrame

The `df.info()` method provides a concise summary of a DataFrame `df`

In [None]:
df.info()

#### Accessing DataFrame Column Names

The `df.columns` provides a list-like object showing all column names in the DataFrame.

In [None]:
df.columns