# SELECT {fields} FROM {table} WHERE {condition}

- plus use of LIKE and wildcards in text fields.
- pluse use of DISTINCT keyword to ensures each col shows up once.
- Once connection is established and the query is defined, can convert this query in a DF, using (at least) to methods:
1. mk the cursor query and convert the resultant list in DF
2. mk the query directly from pandas: pd.
- PANDAS: mk the same query but using ONLY pandas (first import the table as a df)
- Plus: https://www.freecodecamp.org/news/connect-python-with-sql/

## 1. Establish the connection - conecting w/the DB

In [29]:
### Connect to the DB - Establish the connection
import pyodbc

# Valid values for the connection string
driver = '{ODBC Driver 17 for SQL Server}'
server = '(local)'
dbname = 'AdventureWorks2019'
#dbname = 'BikeStores'
user = 'user1'
passwd = 'pass1'

# Construct the Connection String
connection_string = f'DRIVER={driver};SERVER={server};\
    DATABASE={dbname};UID={user};PWD={passwd}'
print('Connection String:\n', connection_string)

# Establish the connection
try:
    connection = pyodbc.connect(connection_string)
    cur = connection.cursor()
    print('SUCCESS: Connection Established')
except pyodbc.Error as e:
    print('ERROR:', e)

Connection String:
 DRIVER={ODBC Driver 17 for SQL Server};SERVER=(local);    DATABASE=AdventureWorks2019;UID=user1;PWD=pass1
SUCCESS: Connection Established


## 2. Using SQL Server and Quering directly to de DB using cursor

In [30]:
# mk functions to convert SQL queries to DF
import pandas as pd

def df_from_query(qry):     # convert cursor.execute(query) to DF
    cur.execute(qry)
    field_names = [i[0] for i in cur.description]
    get_data = [list(x) for x in cur]
    df = pd.DataFrame(data=get_data, columns=field_names)
    return df

def df_from_fetchall(qry):
    cur.execute(qry)
    results = cur.fetchall()
    from_db = [list(r) for r in results]
    cols = [i[0] for i in cur.description]
    df = pd.DataFrame(data=from_db, columns=cols)
    return df

In [31]:
### First query, first look to the products table
query1 = ''' SELECT *
            FROM Production.Product; '''

products_table_df = df_from_query(query1)

print(query1)
display(products_table_df.iloc[[0, 5, -5, -1]])
products_table_df.columns   # to see all cols name of the table

 SELECT *
            FROM Production.Product; 


Unnamed: 0,ProductID,Name,ProductNumber,MakeFlag,FinishedGoodsFlag,Color,SafetyStockLevel,ReorderPoint,StandardCost,ListPrice,...,ProductLine,Class,Style,ProductSubcategoryID,ProductModelID,SellStartDate,SellEndDate,DiscontinuedDate,rowguid,ModifiedDate
0,1,Adjustable Race,AR-5381,False,False,,1000,750,0.0,0.0,...,,,,,,2008-04-30,NaT,,694215B7-08F7-4C0D-ACB1-D734BA44C0C8,2014-02-08 10:01:36.827
5,317,LL Crankarm,CA-5965,False,False,Black,500,375,0.0,0.0,...,,L,,,,2008-04-30,NaT,,3C9D10B7-A6B2-4774-9963-C19DCEE72FEA,2014-02-08 10:01:36.827
499,995,ML Bottom Bracket,BB-8107,True,True,,500,375,44.9506,101.24,...,,M,,5.0,96.0,2013-05-30,NaT,,71AB847F-D091-42D6-B735-7B0C2D82FC84,2014-02-08 10:01:36.827
503,999,"Road-750 Black, 52",BK-R19B-52,True,True,Black,100,75,343.6496,539.99,...,R,L,U,2.0,31.0,2013-05-30,NaT,,AE638923-2B67-4679-B90E-ABBAB17DCA31,2014-02-08 10:01:36.827


Index(['ProductID', 'Name', 'ProductNumber', 'MakeFlag', 'FinishedGoodsFlag',
       'Color', 'SafetyStockLevel', 'ReorderPoint', 'StandardCost',
       'ListPrice', 'Size', 'SizeUnitMeasureCode', 'WeightUnitMeasureCode',
       'Weight', 'DaysToManufacture', 'ProductLine', 'Class', 'Style',
       'ProductSubcategoryID', 'ProductModelID', 'SellStartDate',
       'SellEndDate', 'DiscontinuedDate', 'rowguid', 'ModifiedDate'],
      dtype='object')

In [32]:
### Find a type of product entered by user w/ListPrice > 0

# Input an string to find in product name
prod_to_find = input('Enter a product to find:')

query2 =f''' SELECT ProductID, Name, ListPrice, SellEndDate, DiscontinuedDate
            FROM Production.Product
            WHERE ListPrice > 0 AND Name LIKE '%{prod_to_find}%' '''

product_type_df = df_from_query(query2)

print(query2)
product_type_df

 SELECT ProductID, Name, ListPrice, SellEndDate, DiscontinuedDate
            FROM Production.Product
            WHERE ListPrice > 0 AND Name LIKE '%bike%' 


Unnamed: 0,ProductID,Name,ListPrice,SellEndDate,DiscontinuedDate
0,879,All-Purpose Bike Stand,159.0,NaT,
1,877,Bike Wash - Dissolver,7.95,NaT,
2,876,Hitch Rack - 4-Bike,120.0,NaT,
3,710,"Mountain Bike Socks, L",9.5,2012-05-29,
4,709,"Mountain Bike Socks, M",9.5,2012-05-29,


In [33]:
# trying df_from_fetchall w/same query2
ptype_df = df_from_fetchall(query2)
ptype_df

Unnamed: 0,ProductID,Name,ListPrice,SellEndDate,DiscontinuedDate
0,879,All-Purpose Bike Stand,159.0,NaT,
1,877,Bike Wash - Dissolver,7.95,NaT,
2,876,Hitch Rack - 4-Bike,120.0,NaT,
3,710,"Mountain Bike Socks, L",9.5,2012-05-29,
4,709,"Mountain Bike Socks, M",9.5,2012-05-29,


In [34]:
### DISTINCT https://www.mssqltips.com/sqlservertip/6810/sql-select-distinct-examples/
# query3 = ''' SELECT DISTINCT ListPrice, ProductID, Name, SellEndDate, DiscontinuedDate
query3 = ''' SELECT DISTINCT Color, ListPrice
            FROM Production.Product
            WHERE Name LIKE 'Mountain-500%' '''
diffprice_mountain_df = df_from_query(query3)

print(query3)
diffprice_mountain_df
# Isnt't best example to see DISTINCT but we well practice later

 SELECT DISTINCT Color, ListPrice
            FROM Production.Product
            WHERE Name LIKE 'Mountain-500%' 


Unnamed: 0,Color,ListPrice
0,Black,539.99
1,Silver,564.99


## 3. Using pandas sql method pd.read_sql()

In [35]:
# Same second query
df = pd.read_sql(query2, connection)
df
# UserWarning later in trySQLAlchemy.py
# And also try a simple txt script

  df = pd.read_sql(query2, connection)


Unnamed: 0,ProductID,Name,ListPrice,SellEndDate,DiscontinuedDate
0,879,All-Purpose Bike Stand,159.0,NaT,
1,877,Bike Wash - Dissolver,7.95,NaT,
2,876,Hitch Rack - 4-Bike,120.0,NaT,
3,710,"Mountain Bike Socks, L",9.5,2012-05-29,
4,709,"Mountain Bike Socks, M",9.5,2012-05-29,


## ONLY PANDAS

In [36]:
## First import whole tabla to a df - done in query1
print(products_table_df.shape)
products_table_df.columns   # to see all cols name of the table

(504, 25)


Index(['ProductID', 'Name', 'ProductNumber', 'MakeFlag', 'FinishedGoodsFlag',
       'Color', 'SafetyStockLevel', 'ReorderPoint', 'StandardCost',
       'ListPrice', 'Size', 'SizeUnitMeasureCode', 'WeightUnitMeasureCode',
       'Weight', 'DaysToManufacture', 'ProductLine', 'Class', 'Style',
       'ProductSubcategoryID', 'ProductModelID', 'SellStartDate',
       'SellEndDate', 'DiscontinuedDate', 'rowguid', 'ModifiedDate'],
      dtype='object')

In [37]:
## query1 is as simple as display products_table_df
# products_table_df
df = products_table_df      # alias?
df.iloc[[0, 9, -9, -1]]

Unnamed: 0,ProductID,Name,ProductNumber,MakeFlag,FinishedGoodsFlag,Color,SafetyStockLevel,ReorderPoint,StandardCost,ListPrice,...,ProductLine,Class,Style,ProductSubcategoryID,ProductModelID,SellStartDate,SellEndDate,DiscontinuedDate,rowguid,ModifiedDate
0,1,Adjustable Race,AR-5381,False,False,,1000,750,0.0,0.0,...,,,,,,2008-04-30,NaT,,694215B7-08F7-4C0D-ACB1-D734BA44C0C8,2014-02-08 10:01:36.827
9,321,Chainring Nut,CN-6137,False,False,Silver,1000,750,0.0,0.0,...,,,,,,2008-04-30,NaT,,3314B1D7-EF69-4431-B6DD-DC75268BD5DF,2014-02-08 10:01:36.827
495,991,"Mountain-500 Black, 44",BK-M18B-44,True,True,Black,100,75,294.5797,539.99,...,M,L,U,1.0,23.0,2013-05-30,NaT,,C7852127-2FB8-4959-B5A3-DE5A8D6445E4,2014-02-08 10:01:36.827
503,999,"Road-750 Black, 52",BK-R19B-52,True,True,Black,100,75,343.6496,539.99,...,R,L,U,2.0,31.0,2013-05-30,NaT,,AE638923-2B67-4679-B90E-ABBAB17DCA31,2014-02-08 10:01:36.827


In [38]:
## query2:

query2 =f''' SELECT ProductID, Name, ListPrice, SellEndDate, DiscontinuedDate
            FROM Production.Product
            WHERE ListPrice > 0 AND Name LIKE '%{prod_to_find}%' '''

## Strict case text comparison
#df[['ProductID', 'Name']].loc[(df.ListPrice > 0) & (df.Name.str.contains(prod_to_find))]
#df[['ProductID', 'Name']].loc[(df.ListPrice > 0) & (df['Name'].apply(lambda x: prod_to_find in x))]

## And... like SQL LIKE !
#df[['ProductID', 'Name', 'ListPrice', 'SellEndDate', 'DiscontinuedDate']].loc[
#   (df.ListPrice > 0) & (df.Name.str.contains(prod_to_find, case=False))]

cols_to_show = ['ProductID', 'Name', 'ListPrice', 'SellEndDate', 'DiscontinuedDate']
condition = (df.ListPrice > 0) & (df.Name.str.contains(prod_to_find, case=False))
df[cols_to_show].loc[condition]
#print(condition)

Unnamed: 0,ProductID,Name,ListPrice,SellEndDate,DiscontinuedDate
213,709,"Mountain Bike Socks, M",9.5,2012-05-29,
214,710,"Mountain Bike Socks, L",9.5,2012-05-29,
380,876,Hitch Rack - 4-Bike,120.0,NaT,
381,877,Bike Wash - Dissolver,7.95,NaT,
383,879,All-Purpose Bike Stand,159.0,NaT,


In [39]:
## Once again cause is not the same
prod = 'bike'

query4 = f''' SELECT ProductID, Name, ListPrice
            FROM Production.Product
            WHERE Name LIKE '%{prod}%' '''
rq_df = df_from_query(query4)
display(rq_df)

query_full = ''' SELECT *
                FROM Production.Product'''
df = df_from_query(query_full)

df[['ProductID', 'Name', 'ListPrice']].loc[
    df['Name'].str.contains(prod, case=False)]

#print(prod)

Unnamed: 0,ProductID,Name,ListPrice
0,879,All-Purpose Bike Stand,159.0
1,877,Bike Wash - Dissolver,7.95
2,876,Hitch Rack - 4-Bike,120.0
3,710,"Mountain Bike Socks, L",9.5
4,709,"Mountain Bike Socks, M",9.5


Unnamed: 0,ProductID,Name,ListPrice
213,709,"Mountain Bike Socks, M",9.5
214,710,"Mountain Bike Socks, L",9.5
380,876,Hitch Rack - 4-Bike,120.0
381,877,Bike Wash - Dissolver,7.95
383,879,All-Purpose Bike Stand,159.0


In [40]:
# All columns
df[(df.ListPrice > 0) & (df.Name.str.contains(prod_to_find, case=False))]

Unnamed: 0,ProductID,Name,ProductNumber,MakeFlag,FinishedGoodsFlag,Color,SafetyStockLevel,ReorderPoint,StandardCost,ListPrice,...,ProductLine,Class,Style,ProductSubcategoryID,ProductModelID,SellStartDate,SellEndDate,DiscontinuedDate,rowguid,ModifiedDate
213,709,"Mountain Bike Socks, M",SO-B909-M,False,True,White,4,3,3.3963,9.5,...,M,,U,23.0,18.0,2011-05-31,2012-05-29,,18F95F47-1540-4E02-8F1F-CC1BCB6828D0,2014-02-08 10:01:36.827
214,710,"Mountain Bike Socks, L",SO-B909-L,False,True,White,4,3,3.3963,9.5,...,M,,U,23.0,18.0,2011-05-31,2012-05-29,,161C035E-21B3-4E14-8E44-AF508F35D80A,2014-02-08 10:01:36.827
380,876,Hitch Rack - 4-Bike,RA-H123,False,True,,4,3,44.88,120.0,...,S,,,26.0,118.0,2013-05-30,NaT,,7A0C4BBD-9679-4F59-9EBC-9DAF3439A38A,2014-02-08 10:01:36.827
381,877,Bike Wash - Dissolver,CL-9009,False,True,,4,3,2.9733,7.95,...,S,,,29.0,119.0,2013-05-30,NaT,,3C40B5AD-E328-4715-88A7-EC3220F02ACF,2014-02-08 10:01:36.827
383,879,All-Purpose Bike Stand,ST-1401,False,True,,4,3,59.466,159.0,...,M,,,27.0,122.0,2013-05-30,NaT,,C7BB564B-A637-40F5-B21B-CBF7E4F713BE,2014-02-08 10:01:36.827


In [41]:
# we just used read_sql to explore the DB directly
# we also have a df.query() method that is the same as .loc[]

color = 'Silver'        # color = input('What Color? ')
price = 100             # price = int(input('Max price? '))

cols_to_show = ['ProductID', 'Name', 'Color', 'ListPrice',
                'SellEndDate', 'DiscontinuedDate']
# pd.query()
display(df[cols_to_show].query(
    f'Color == "{color}" and ListPrice < {price} and ListPrice > 0'))
# pd.query() CAN NOT use LIKE (like native SQL -not for me at least)

# pd.loc[]
display(df[cols_to_show].loc[(df.Color == color) &
                             (df.ListPrice < price) & (df.ListPrice > 0)])

cols_str = (', ').join(cols_to_show)
print(cols_str)

# query de DB and create DF w/result
query5 = f''' SELECT {cols_str}
            FROM Production.Product
            WHERE Color = '{color}' and ListPrice < {price} and
                ListPrice > 0 '''

res_df = df_from_query(query5)
res_df

# ?? Diff NaT, NaN and None (reset index?)
# Speed comparative necessary


Unnamed: 0,ProductID,Name,Color,ListPrice,SellEndDate,DiscontinuedDate
384,880,Hydration Pack - 70 oz.,Silver,54.99,NaT,
449,945,Front Derailleur,Silver,91.49,NaT,
456,952,Chain,Silver,20.24,NaT,


Unnamed: 0,ProductID,Name,Color,ListPrice,SellEndDate,DiscontinuedDate
384,880,Hydration Pack - 70 oz.,Silver,54.99,NaT,
449,945,Front Derailleur,Silver,91.49,NaT,
456,952,Chain,Silver,20.24,NaT,


ProductID, Name, Color, ListPrice, SellEndDate, DiscontinuedDate


Unnamed: 0,ProductID,Name,Color,ListPrice,SellEndDate,DiscontinuedDate
0,880,Hydration Pack - 70 oz.,Silver,54.99,,
1,945,Front Derailleur,Silver,91.49,,
2,952,Chain,Silver,20.24,,


In [42]:
## speed comparative in this cell
%timeit pdquery_df = df[cols_to_show].query(f'Color == "{color}" and ListPrice < {price} and ListPrice > 0')

%timeit pdloc_df = df[cols_to_show].loc[(df.Color == color) & (df.ListPrice < price) & (df.ListPrice > 0)]

%timeit sqlquery_df = res_df = df_from_query(query5)

%timeit pdsqlquery_df = pd.read_sql(query5, connection)


3.32 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.36 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
895 µs ± 131 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)




2.83 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)




- https://datatofish.com/sql-to-pandas-dataframe/
- https://learn.microsoft.com/en-us/sql/machine-learning/data-exploration/python-dataframe-pandas?view=sql-server-ver16
- https://stackoverflow.com/questions/43175382/python-create-a-pandas-data-frame-from-a-list
- https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sql-using-pyodbc?view=sql-server-ver16
- https://stackoverflow.com/questions/3783238/python-database-connection-close
- https://www.freecodecamp.org/news/connect-python-with-sql/