### Relational Operations   
13 relational operations are used in SELECT Command   
SELECT can also be clause.   

##### Fundamental Operations  
1. Restriction          E.g., LENGTH(name) < 5
2. Projection           SELECT certain columns 
3. Cartesian Product  
4. Union  
5. Difference  
6. Rename  

---
##### Additional Operations   
7. Intersection  
8. Natural Join  
9. Assign  
##### Extended Operations   
10. Generalized Projection  
11. Left Outer Join  
12. Right Outer Join  
13. Full Outer Join  

SELECT
```
SELECT output FROM input WHERE filter.
   SELECT ALL (default) or SELECT DISTINCT
Result columns can be given explicit names with an optional AS clause 
WHERE clause is used to filter rows
GROUP BY clause allows sets of rows in the result set to be collapsed into single rows.
   HAVING clause can be used in conjunction with a GROUP BY clause to filter aggregate results 
ORDER BY clause sorts the result set into a specific order   
LIMIT clause can be used to control how many rows are returned
```   
DELETE
```
DELETE FROM database_name.table_name;
DELETE FROM database_name.table_name WHERE id = 42;
```


---
### Basic Queries    
Ref: Databases/AC wise Voters list 2023.pdf  

1. On 25 Apr 2023, CEO-KA released AC-wise count of voters. KA_2305_VoterCounts.csv has the data.  
2. Create a database named 'Elections_2023.sqlite'   
3. Create a table named VoterCounts with columns District, AC_Num, AC_Name, Male, Female, Others, and Total. Use AC_Num as Primary Key, 


In [1]:
import pandas as pd
import sqlite3 as lite
import re
import os
from glob import glob

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [3]:
con = lite.connect(f"G:/.shortcut-targets-by-id/1aBWQaFQhja65Ljp3SLszLrSLfIbp0hjJ/DBMS_Course/Databases/Elections_2023.sqlite")
cur = con.cursor()

In [4]:
pd.read_sql("SELECT * FROM sqlite_master", con)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,VoterCounts,VoterCounts,2,CREATE TABLE VoterCounts \n (District ...


In [5]:
pd.read_sql("PRAGMA table_info(VoterCounts)", con)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,District,TEXT,0,,0
1,1,AC_Num,INTEGER,0,,0
2,2,AC_Name,TEXT,0,,0
3,3,Male,INTEGER,0,,0
4,4,Female,INTEGER,0,,0
5,5,Others,INTEGER,0,,0
6,6,Total,INTEGER,0,,0


---  
### Queries    
##### Find the number of rows in the table      

In [6]:
sql = "SELECT COUNT(*) FROM VoterCounts"
print(f"Total Rows: {list(con.execute(sql))[0][0]}") 

Total Rows: 224


##### filter rows by rowid   

In [7]:
sql = """
  SELECT * FROM VoterCounts 
  LIMIT 24,5
"""  
pd.read_sql(sql, con)

Unnamed: 0,District,AC_Num,AC_Name,Male,Female,Others,Total
0,BAGALKOT,25,Hungund,111068,111367,11,222446
1,BIJAPUR,26,Muddebihal,110091,106897,21,217009
2,BIJAPUR,27,Devara Hippargi,112709,106402,20,219131
3,BIJAPUR,28,Basavana Bagevadi,107003,102799,12,209814
4,BIJAPUR,29,Babaleshwar,110801,106346,4,217151


##### Task   
Vary the limit values and check results.  
What happens if the numbers are higher than the count of rows?  
Negative numbers?  

#### Voter Counts by Gender - aggregation    
Sum of column values   

In [20]:
def fetch_one():
    sql = "SELECT SUM(Female) FROM VoterCounts"
    print(f"Female Voters: {list(con.execute(sql))[0][0]:_}")

    sql = "SELECT SUM(Male) FROM VoterCounts"
    print(f"Male Voters:   {list(con.execute(sql))[0][0]:,}")

    sql = "SELECT SUM(Others) FROM VoterCounts"
    print(f"Other Voters:       {list(con.execute(sql))[0][0]:,}")

    sql = "SELECT SUM(Total) FROM VoterCounts"
    print(f"Total Voters:  {list(con.execute(sql))[0][0]:,}")
    
fetch_one()    

Female Voters: 26_398_483
Male Voters:   26,682,156
Other Voters:       4,927
Total Voters:  53,085,566


In [9]:
def fetch_many():
    sql = """SELECT SUM(Female) as Females, 
             SUM(Male) as Males,  
             SUM(Others) as Others, 
             SUM(Total) as Total 
             FROM VoterCounts"""
    print(list(con.execute(sql)))
    return pd.read_sql(sql, con)

fetch_many()

[(26398483, 26682156, 4927, 53085566)]


Unnamed: 0,Females,Males,Others,Total
0,26398483,26682156,4927,53085566


---  
#### Compare Performance  

In [27]:
def fetch_one_():
    sql = "SELECT SUM(Female) FROM VoterCounts"
    con.execute(sql)

    sql = "SELECT SUM(Male) FROM VoterCounts"
    con.execute(sql)

    sql = "SELECT SUM(Others) FROM VoterCounts"
    con.execute(sql)

    sql = "SELECT SUM(Total) FROM VoterCounts"
    con.execute(sql)
  
def fetch_many_():
    sql = """SELECT SUM(Female) as Females, 
             SUM(Male) as Males,  
             SUM(Others) as Others, 
             SUM(Total) as Total 
             FROM VoterCounts"""
    con.execute(sql)

In [29]:
print("\nperformance with four separate queries:")
%timeit fetch_one_()
print("\nperformance with single query:")
%timeit fetch_many_()


performance with four separate queries:
3.78 ms ± 434 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

performance with single query:
1.08 ms ± 89.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


---

#### Bangalore Voters  
Find the number of voters - total and by gender - in Bangalore districts,   
which include Bangalore Urban and BBMP districts.  

##### Number of constituencies in Bangalore districts   
WHERE clause filter   
including boolean conditions   

In [34]:
sql = """SELECT COUNT(*) as Constituencies, 
         MIN(AC_Num) as AC_From,  
         MAX(AC_Num) as AC_To   
         FROM VoterCounts  
         WHERE District like 'Bangalore URBAN%'   -- where
         OR District like 'B.B.M.P%'           -- OR where
"""         

res = list(con.execute(sql))         
print("\nConstituencies in Bangalore Urban and B.B.M.P Districts:", end='')         
print(f" {res[0][0]}, \nfrom AC-{res[0][1]} to AC-{res[0][2]}")



Constituencies in Bangalore Urban and B.B.M.P Districts: 28, 
from AC-150 to AC-177


#### Voter count by district  
GROUP BY operation to aggregate by groups   
ORDER BY to sort    

In [17]:
sql = """SELECT 
         District, 
         SUM(Total) as Voters                  -- aggregation, voter counts
         FROM VoterCounts           
         GROUP BY District                     -- for the rows in each district
         ORDER BY Voters
    """
pd.read_sql(sql, con)

Unnamed: 0,District,Voters
0,KODAGU,456313
1,CHAMARAJNAGAR,861489
2,GADAG,867955
3,BANGALORE RURAL,877890
4,RAMANAGARAM,904702
5,CHIKKMAGALUR,973238
6,YADGIR,999959
7,UDUPI,1041672
8,CHIKKABALLAPUR,1050142
9,VIJAYANAGARA,1092011


##### Task   
Edit the query clauses and check the results  

#### Voter count by Bangalore districts    

In [18]:
sql = """SELECT 
         District, 
         SUM(Total) as Voters                    -- aggregation, voter counts
         FROM VoterCounts  
         WHERE District like 'Bangalore URBAN%'  -- filter for Bangalore districts 
         OR District like 'B.B.M.P%'
         GROUP BY District                     -- for the rows in each district
    """
pd.read_sql(sql, con)

Unnamed: 0,District,Voters
0,B.B.M.P(CENTRAL),1790650
1,B.B.M.P(NORTH),2195968
2,B.B.M.P(SOUTH),2052336
3,BANGALORE URBAN,3674395


##### Task   
modify the above query to inlcude the counts of male and female voters   
and % of females for each district   

#### Total voters in Bangalore districts  

In [44]:
sql = """SELECT SUM(Female) as Females, 
         SUM(Male) as Males,  
         SUM(Others) as Others, 
         SUM(Total) as Total,
         (100 * SUM(Female) / SUM(Total)) as Female_Perc
         FROM VoterCounts  
         WHERE District like 'Bangalore URBAN%'  
         OR District like 'B.B.M.P%'"""

print("\nVoters in Bangalore Districts:")

print(list(con.execute(sql))); print()
pd.read_sql(sql, con)


Voters in Bangalore Districts:
[(4686813, 5024775, 1761, 9713349, 48)]



Unnamed: 0,Females,Males,Others,Total,Female_Perc
0,4686813,5024775,1761,9713349,48


#### List the voters in constituencies, ordered by size   
ORDER BY Total  `[ DESC | ASC ]`

In [30]:
sql = """SELECT * FROM VoterCounts  
         ORDER BY Total DESC"""
pd.read_sql(sql, con)

Unnamed: 0,District,AC_Num,AC_Name,Male,Female,Others,Total
0,BANGALORE URBAN,176,Bangalore South,366719,328920,103,695742
1,BANGALORE URBAN,174,Mahadevapura,326692,280346,120,607158
2,BANGALORE URBAN,153,Yeshwanthapura,287823,276266,83,564172
3,B.B.M.P(NORTH),151,K.R. Pura,266236,244188,167,510591
4,BANGALORE URBAN,152,Byatarayanapura,262739,245937,123,508799
...,...,...,...,...,...,...,...
219,UTTARA KANNADA,81,Yellapur,92238,90118,0,182356
220,TUMKUR,135,Gubbi,90483,90592,11,181086
221,UTTARA KANNADA,76,Haliyal,91132,89899,2,181033
222,CHIKKMAGALUR,124,Mudigere,84094,87898,6,171998


#### Find Female to Male ratio of voters   
in Karnataka   
in Bangalore Districts  

##### Percentage of Female voters in Karnataka 

In [45]:
sql = """SELECT 
         100*(SUM(Female))/SUM(Total) 
         FROM VoterCounts;"""
print(f"{list(con.execute(sql))[0][0]}% Karnataka voters are Females ")      

49% Karnataka voters are Females 


##### Percentage of Female voters in Bangalore 

In [46]:
sql = """SELECT 
         100*(SUM(Female))/SUM(Total) 
         FROM VoterCounts 
         WHERE AC_Num >= 150 AND AC_Num <= 177;"""
print(f"{list(con.execute(sql))[0][0]}% Bangalore voters are Females ")  


48% Bangalore voters are Females 


#### CAST AS    
##### Female to male ratio  
Use CAST AS FLOAT because integer division will give 0 if result < 1

In [50]:
print(f"Python Integer Division returns a float: {3/4}")
print(f'SQL Integer Division returns an integer: {con.execute("SELECT 3/4").fetchone()[0]}')

Python Integer Division returns a float: 0.75
SQL Integer Division returns an integer: 0


In [49]:
sql = """SELECT 
         CAST(SUM(Female) AS FLOAT)/
         CAST(SUM(Male) AS FLOAT) 
         AS Gender_Ratio  
         FROM VoterCounts;"""
print(f"Female to Male voter ratio in Karnataka: {round(list(con.execute(sql))[0][0], 2)}  ")  

sql = """SELECT 
         CAST(SUM(Female) AS FLOAT)/
         CAST(SUM(Male) AS FLOAT) 
         AS Gender_Ratio  
         FROM VoterCounts
         WHERE AC_Num >= 150 AND AC_Num <= 177;"""
print(f"Female to Male voter ratio in Bngalore:  {round(list(con.execute(sql))[0][0], 2)}  ")  

Female to Male voter ratio in Karnataka: 0.99  
Female to Male voter ratio in Bngalore:  0.93  


#### List the constituencies ordered by Female to Male ratio of voters  
- In SQL, integer division returns integer  
- SQL operators work on data outside the tables too  

In [26]:
sql = """SELECT *, 
         ROUND(CAST (Female AS float)/
         CAST (Total AS float), 2) 
         as Female_ratio 
         FROM VoterCounts
         ORDER BY Female_Ratio DESC;""" 
pd.read_sql(sql, con)

Unnamed: 0,District,AC_Num,AC_Name,Male,Female,Others,Total,Female_ratio
0,UDUPI,119,Kundapur,100751,108784,2,209537,0.52
1,UDUPI,120,Udupi,104787,112148,3,216938,0.52
2,UDUPI,121,Kaup,90517,98430,5,188952,0.52
3,UDUPI,122,Karkala,91435,99142,0,190577,0.52
4,RAMANAGARAM,185,Channapatna,111530,118787,10,230327,0.52
...,...,...,...,...,...,...,...,...
219,BANGALORE URBAN,155,Dasarahalli,243448,213521,77,457046,0.47
220,B.B.M.P(NORTH),161,C.V.RamannNagar,142565,128529,117,271211,0.47
221,BANGALORE URBAN,176,Bangalore South,366719,328920,103,695742,0.47
222,BANGALORE URBAN,174,Mahadevapura,326692,280346,120,607158,0.46


In [51]:
sql = """SELECT *, 100*Female/Total as Female_Perc 
         FROM VoterCounts
         ORDER BY Female_Perc DESC;""" 
pd.read_sql(sql, con)

Unnamed: 0,District,AC_Num,AC_Name,Male,Female,Others,Total,Female_Perc
0,UDUPI,121,Kaup,90517,98430,5,188952,52
1,UDUPI,122,Karkala,91435,99142,0,190577,52
2,DAKSHINA KANNADA,203,Mangalore City South,117475,128222,47,245744,52
3,RAICHUR,55,Manvi,114498,119261,64,233823,51
4,RAICHUR,58,Sindhanur,116916,122309,22,239247,51
...,...,...,...,...,...,...,...,...
219,BANGALORE URBAN,176,Bangalore South,366719,328920,103,695742,47
220,BANGALORE URBAN,177,Anekal,210826,191683,86,402595,47
221,B.B.M.P(SOUTH),175,Bommanahalli,242406,210145,70,452621,46
222,BANGALORE URBAN,155,Dasarahalli,243448,213521,77,457046,46


#### List ten constituencies with the highest Female % of voters  

In [52]:
sql = sql = """SELECT *, 100*Female/Total as Female_Perc 
         FROM VoterCounts
         ORDER BY Female_Perc DESC 
         LIMIT 10;""" 
pd.read_sql(sql, con)

Unnamed: 0,District,AC_Num,AC_Name,Male,Female,Others,Total,Female_Perc
0,UDUPI,121,Kaup,90517,98430,5,188952,52
1,UDUPI,122,Karkala,91435,99142,0,190577,52
2,DAKSHINA KANNADA,203,Mangalore City South,117475,128222,47,245744,52
3,RAICHUR,55,Manvi,114498,119261,64,233823,51
4,RAICHUR,58,Sindhanur,116916,122309,22,239247,51
5,BELLARY,93,Bellary,116096,122181,49,238326,51
6,BELLARY,94,Bellary City,126067,133087,30,259184,51
7,SHIMOGA,112,Bhadravathi,103198,108962,5,212165,51
8,SHIMOGA,113,Shimoga,127441,133249,14,260704,51
9,UDUPI,118,Baindur,115346,120319,3,235668,51




#### List ten constituencies with the lowest Female to Male ratio of voters  



In [53]:
sql = """SELECT *, 100*Female/Total as Female_Perc 
         FROM VoterCounts
         ORDER BY Female_Perc ASC
         LIMIT 10;""" 
pd.read_sql(sql, con)

Unnamed: 0,District,AC_Num,AC_Name,Male,Female,Others,Total,Female_Perc
0,B.B.M.P(SOUTH),175,Bommanahalli,242406,210145,70,452621,46
1,BANGALORE URBAN,155,Dasarahalli,243448,213521,77,457046,46
2,BANGALORE URBAN,174,Mahadevapura,326692,280346,120,607158,46
3,GULBARGA,46,Aland,125970,115474,34,241478,47
4,BIDAR,47,Basavakalyan,128287,118087,5,246379,47
5,BIDAR,51,Bhalki,120172,110546,3,230721,47
6,B.B.M.P(NORTH),151,K.R. Pura,266236,244188,167,510591,47
7,B.B.M.P(NORTH),161,C.V.RamannNagar,142565,128529,117,271211,47
8,B.B.M.P(SOUTH),172,B.T.M Layout,143266,131717,45,275028,47
9,BANGALORE URBAN,176,Bangalore South,366719,328920,103,695742,47


#### List the district names  
DISTINCT keyword selects unique values  

In [54]:
# sql = "SELECT DISTINCT District FROM VoterCounts;"
sql = """
SELECT DISTINCT District
FROM VoterCounts;"""

for dist in (list(con.execute(sql))):
  print(dist[0])

BELGAUM
BAGALKOT
BIJAPUR
GULBARGA
BIDAR
RAICHUR
KOPPAL
GADAG
DHARWAD
UTTARA KANNADA
HAVERI
BELLARY
CHITRADURGA
DAVANGERE
SHIMOGA
UDUPI
CHIKKMAGALUR
TUMKUR
CHIKKABALLAPUR
KOLAR
BANGALORE RURAL
RAMANAGARAM
MANDYA
HASSAN
DAKSHINA KANNADA
KODAGU
MYSORE
CHAMARAJNAGAR
B.B.M.P(CENTRAL)
B.B.M.P(NORTH)
B.B.M.P(SOUTH)
BANGALORE URBAN
YADGIR
VIJAYANAGARA


#### List counts by district, in descending ordered of size      

In [48]:
sql = """SELECT District, 
         SUM(Male) as Male, 
         SUM(Female) as Female, 
         SUM(Others) as Other, 
         SUM(Total) as Total 
         FROM VoterCounts 
         GROUP BY District
         ORDER BY Total DESC;"""
pd.read_sql(sql, con)

Unnamed: 0,District,Male,Female,Other,Total
0,BELGAUM,1990856,1956143,151,3947150
1,BANGALORE URBAN,1921776,1751945,674,3674395
2,MYSORE,1317121,1338637,230,2655988
3,TUMKUR,1120698,1127126,108,2247932
4,GULBARGA,1121972,1095754,329,2218055
5,B.B.M.P(NORTH),1124272,1071209,487,2195968
6,B.B.M.P(SOUTH),1063764,988206,366,2052336
7,BIJAPUR,966535,926096,221,1892852
8,B.B.M.P(CENTRAL),914963,875453,234,1790650
9,DAKSHINA KANNADA,870991,910314,84,1781389


#### For Bangalore, list counts by district, in descending order of size   

In [49]:
sql = """SELECT District, 
         SUM(Male) as Male, 
         SUM(Female) as Female, 
         SUM(Others) as Other, 
         SUM(Total) as Total 
         FROM VoterCounts 
         WHERE AC_Num >= 150 AND AC_Num <= 177 
         GROUP BY District
         ORDER BY Total DESC;"""
pd.read_sql(sql, con)

Unnamed: 0,District,Male,Female,Other,Total
0,BANGALORE URBAN,1921776,1751945,674,3674395
1,B.B.M.P(NORTH),1124272,1071209,487,2195968
2,B.B.M.P(SOUTH),1063764,988206,366,2052336
3,B.B.M.P(CENTRAL),914963,875453,234,1790650


---
##### Task   
1. From your implementation of 0102_lab, check distinct values of   
 ['Sex', 'Home_State', 'Home_Town', 'Mother_Tongue', 'Elective', 'Clubs']    
2. Check the min, max, and mean heights of boys and girls for the whole class and also by section.  
3. Print name, USN, section, home_state and mother_tongue of students from Karnataka with mother_tongue other than Kannada.  
4. Get count of students for each elective by section (A, B, C).  