### Working with Databases

#

**Python offers several libraries and tools to interact with different types of databases, including SQL and NoSQL databases.**
- **SQLite (sqlite3)**
  - A lightweight, serverless database included with Python, ideal for small to medium-sized applications and prototyping.
- **MySQL (mysql-connector-python, PyMySQL)**
  - A widely used relational database, perfect for large-scale applications and web development, with strong community support.
- **PostgreSQL (psycopg2, SQLAlchemy)**
  - An advanced, open-source relational database known for its extensibility and support for complex queries and data types.
- **MongoDB (pymongo)**
  - A NoSQL database that stores data in flexible, JSON-like documents, ideal for handling unstructured data and real-time analytics.
- **SQLAlchemy**
  - A powerful ORM library for Python, allowing seamless interaction with databases using Python objects instead of raw SQL.

#

#### Connecting to SQL Server

We will be using SQLAlchemy to connect to Sql Server

In [None]:
# install sqlalchemy if its not present
# pip install sqlalchemy

In [2]:
import pandas as pd
import sqlalchemy as sal

In [3]:
# establishing connection to sql server
engine = sal.create_engine('mssql://<sql server _name>\SQLEXPRESS/<database_name>?driver=ODBC+DRIVER+17+FOR+SQL+SERVER')
conn=engine.connect()

In [None]:
# if you have pass and username both has to mentioned
# connection_string = ("mssql+pyodbc://<username>:<password>@<server>/<database>?driver=ODBC+Driver+17+for+SQL+Server")

In [4]:
# reading from a table in sql server and storing in a df
df_sql = pd.read_sql_query('select * from orders', conn)
df_sql


# read_sql_query() is a function from the Pandas library.
# It executes an SQL query and returns the result as a Pandas DataFrame

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,...,postal_code,region,product_id,category,sub_category,product_name,sales,quantity,discount,profit
0,925.0,CA-2020-149797,2020-09-15,2020-09-20,Standard Class,AH-10075,Adam Hart,Corporate,United States,New York City,...,10011.0,East,OFF-BI-10003650,Office Supplies,Binders,GBC DocuBind 300 Electric Binding Machine,841.568,2.0,0.2,294.5488
1,926.0,CA-2018-132962,2018-09-13,2018-09-16,First Class,JM-15535,Jessica Myrick,Consumer,United States,Philadelphia,...,19143.0,East,OFF-PA-10003543,Office Supplies,Paper,Xerox 1985,15.552,3.0,0.2,5.4432
2,927.0,CA-2018-132962,2018-09-13,2018-09-16,First Class,JM-15535,Jessica Myrick,Consumer,United States,Philadelphia,...,19143.0,East,TEC-AC-10004353,Technology,Accessories,Hypercom P1300 Pinpad,252.000,5.0,0.2,53.5500
3,928.0,CA-2019-115091,2019-10-05,2019-10-09,Standard Class,JJ-15760,Joel Jenkins,Home Office,United States,Springfield,...,22153.0,South,OFF-AR-10000658,Office Supplies,Art,Newell 324,46.200,4.0,0.0,12.9360
4,929.0,CA-2019-115091,2019-10-05,2019-10-09,Standard Class,JJ-15760,Joel Jenkins,Home Office,United States,Springfield,...,22153.0,South,OFF-AP-10000696,Office Supplies,Appliances,Holmes Odor Grabber,28.840,2.0,0.0,9.5172
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9989,8774.0,CA-2019-132276,2019-02-23,2019-02-28,Standard Class,LC-16960,Lindsay Castell,Home Office,United States,New York City,...,10024.0,East,OFF-AP-10000804,Office Supplies,Appliances,Hoover Portapower Portable Vacuum,26.880,6.0,0.0,6.7200
9990,8775.0,CA-2019-132276,2019-02-23,2019-02-28,Standard Class,LC-16960,Lindsay Castell,Home Office,United States,New York City,...,10024.0,East,OFF-BI-10002982,Office Supplies,Binders,Avery Self-Adhesive Photo Pockets for Polaroid...,10.896,2.0,0.2,3.8136
9991,8776.0,CA-2020-163636,2020-12-05,2020-12-09,Second Class,MP-18175,Mike Pelletier,Home Office,United States,Chicago,...,60623.0,Central,OFF-AR-10001547,Office Supplies,Art,Newell 311,3.536,2.0,0.2,0.3094
9992,8777.0,CA-2020-102813,2020-07-02,2020-07-03,First Class,EA-14035,Erin Ashbrook,Corporate,United States,Huntsville,...,77340.0,Central,FUR-CH-10000665,Furniture,Chairs,"Global Airflow Leather Mesh Back Chair, Black",528.430,5.0,0.3,0.0000


#

### Extract Transoform Load

#### Extract: 
We will read the data from orders and returns dataset

#### Transform:
Join the two datasets on the order_id columns 

#### Load:
Finally, we will load the combined data into sql server table 

In [16]:
def extract():
    order_df = pd.read_csv('orders.txt')
    returns_df = pd.read_csv('returns.txt')
    return order_df, returns_df


def transform(order_df, returns_df):
    joined_df = pd.merge(order_df, returns_df, how='inner', on='order_id')
    return joined_df


def load(joined_df):
    joined_df.to_sql('orders_final', con=conn, index=False, if_exists = 'append')

In [18]:
order_df, returns_df = extract()

joined_df = transform(order_df, returns_df)

load(joined_df)

In [9]:
returns_df

Unnamed: 0,order_id,return_reason
0,CA-2020-104689,Wrong Items
1,CA-2020-105081,Wrong Items
2,CA-2020-105291,Wrong Items
3,CA-2020-105585,Wrong Items
4,CA-2020-106950,Wrong Items
...,...,...
291,US-2021-136679,Others
292,US-2021-147886,Others
293,US-2021-147998,Wrong Items
294,US-2021-151127,Wrong Items
