## SQL - part 3

In [1]:
from sqlite3 import connect

'''    
    Establish a connection to the database.
    This statement creates the file iat the given path if it does not exist.
    The file was provided so the statement should just establish the connection.
'''
connection = connect('../datasets/org.Hs.eg.sqlite')
cursor = connection.cursor()


#### Major SQL commands: SELECT, INSERT, DELETE, UPDATE
#### SELECT - Retrieves data from one or more tables and doesn’t change the data at all 

* SELECT  * (means all columns), or the comma separated names of the columns of data you wish to return
    * Returns columns (left to right) in the order received. 
    * '*' selects ALL rows and ALL columns and returns them by column order and row_id
* FROM is the table source or sources (comma separated)
* WHERE (optional) is the predicate clause: conditions for the query
    * Evaluates to True or False for each row
    * This clause almost always includes Column-Value pairs.
    * Omitting the Where clause returns ALL the records in that table.
    * Note: the match is case sensitive
* ORDER BY (optional) indicates a sort order for the output data 
    * default is row_id, which can be very non-intuitive  
    * ASCending or DESCending can be appended to change the sort order.  (ASC is default)
* GROUP BY (optional) groups by a column and creates summary data for a different column
* HAVING (optional) allows restrictions on the rows selected
    * a GROUP BY clause is required before HAVING
* LIMIT (optional) reduces the number of rows retrieved to the number provided after this clause
* In most SQL clients, the ";" indicates the end of a statement and requests execution


In [3]:
def get_header(cursor):
    '''
    Makes a tab delimited header row from the cursor description.
    Arguments:
        cursor: a cursor after a select query
    Returns:
        string: A string consisting of the column names separated by tabs, no new line
    '''
    return '\t'.join([row[0] for row in cursor.description])

def get_results(cursor):
    '''
    Makes a tab delimited table from the cursor results.
    Arguments:
        cursor: a cursor after a select query
    Returns:
        string: A string consisting of the column names separated by tabs, no new line
    ''' 
    res = list()
    for row in cursor.fetchall():        
        res.append('\t'.join(list(map(str,row))))
    return "\n".join(res)


In [4]:
# In every SQLite database, there is a special table: sqlite_master
# sqlite_master -  describes the contents of the database

sql = '''
SELECT type, name 
FROM sqlite_master 
WHERE type = "table";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

type	name
table	metadata
table	map_metadata
table	map_counts
table	genes
table	gene_info
table	chromosomes
table	accessions
table	cytogenetic_locations
table	omim
table	refseq
table	pubmed
table	unigene
table	chrlengths
table	go_bp
table	go_mf
table	go_cc
table	go_bp_all
table	go_mf_all
table	go_cc_all
table	kegg
table	ec
table	chromosome_locations
table	pfam
table	prosite
table	alias
table	ensembl
table	ensembl2ncbi
table	ncbi2ensembl
table	ensembl_prot
table	ensembl_trans
table	uniprot
table	ucsc
table	sqlite_stat1
table	sqlite_stat4
table	sqlite_sequence


#### A `PRIMARY KEY` is a very important concept to understand.  
* It is the designation for a column or a set of columns from a table.
* It is recommended to be a serial value and not something related to the business needs of the data in the table.

* A primary key is used to uniquely identify a row of data; combined with a column name, uniquely locates a data entry
* A primary key by definition must be `UNIQUE` and `NOT NULL` 
* The primary key of a table, should be a (sequential) non-repeating and not null value  
* Primary keys are generally identified at time of table creation  
* A common method for generating a primary key, is to set the datatype to `INTEGER` and declare `AUTOINCREMENT` which will function when data is inserted into the table
* Primary keys can be a composite of 2 or more columns that uniquely identify the data in the table



#### A `FOREIGN KEY` is a column(s) that points to the `PRIMARY KEY` of another table 

* The purpose of the foreign key is to ensure referential integrity of the data. 
In other words, only values that are supposed to appear in the database are permitted.<br>
Only the values that exist in the `PRIMARY KEY` column are allowed to be present in the FOREIGN KEY column.

They are also the underpinning of how tables are joined and relationships portrayed in the database


#### JOIN tables

* Multiple tables contain different data that we want to retrieve from a single query
* In order to assemble data as part of a query, a JOIN between tables is needed
* This is a very common practice, since it’s rare for all the data you want to be in a single table


* INNER JOIN - return only those rows where there is matching content in BOTH tables (is the default when JOIN is used)
* OUTER JOIN - returns all rows from both tables even if one of the tables is blank
* SELF JOIN - can be used to join a table to itself (through aliasing), to compare data internal to the table

```sql
SELECT ... FROM table1 [INNER] JOIN table2 ON conditional_expression
```


In [8]:
sql = '''
SELECT *
FROM gene_info AS gi
INNER JOIN go_bp AS go
ON gi._id = go._id
--WHERE evidence = "ND"
LIMIT 5;'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

_id	gene_name	symbol	_id	go_id	evidence
1	alpha-1-B glycoprotein	A1BG	1	GO:0002576	TAS
1	alpha-1-B glycoprotein	A1BG	1	GO:0008150	ND
1	alpha-1-B glycoprotein	A1BG	1	GO:0043312	TAS
2	alpha-2-macroglobulin	A2M	2	GO:0001869	IDA
2	alpha-2-macroglobulin	A2M	2	GO:0002576	TAS


In [16]:
# gene information for gene with the max number of associated go terms
sql = '''
SELECT gi.symbol, go_term_no
FROM gene_info AS gi
INNER JOIN 
(SELECT _id, count(go_id) AS go_term_no
FROM go_bp
GROUP BY _id) AS go
ON gi._id == go._id
WHERE go_term_no IN 
(SELECT max(go_term_no) FROM
(SELECT _id, count(go_id) AS go_term_no
FROM go_bp
GROUP BY _id));
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

symbol	go_term_no
TGFB1	199


In [15]:
# gene information for gene with the max number of associated go terms
sql = '''
SELECT _id, count(go_id) AS go_term_no1
FROM go_bp
GROUP BY _id
HAVING go_term_no1 IN
(
SELECT max(go_term_no)
FROM
(SELECT _id, count(go_id) AS go_term_no
FROM go_bp
GROUP BY _id
)
);
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

_id	go_term_no1
5710	199


In [70]:
# gene information for gene with the max number of associated go terms
sql = '''
SELECT * 
FROM gene_info
WHERE _id IN
(
SELECT _id FROM
(
SELECT _id, count(go_id) AS go_term_no1
FROM go_bp
GROUP BY _id
HAVING go_term_no1 IN
(
SELECT max(go_term_no)
FROM
(SELECT _id, count(go_id) AS go_term_no
FROM go_bp
GROUP BY _id
)
)));
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

_id	gene_name	symbol
5710	transforming growth factor beta 1	TGFB1


#### See the create table statement

In [17]:
# sql column in the sqlite_master table

sql = '''
SELECT sql
FROM sqlite_master 
WHERE type= "table" and name == "go_bp";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

sql
CREATE TABLE go_bp (
      _id INTEGER NOT NULL,                         -- REFERENCES  genes 
      go_id CHAR(10) NOT NULL,                      -- GO ID
      evidence CHAR(3) NOT NULL,                    -- GO evidence code
      FOREIGN KEY (_id) REFERENCES  genes  (_id)
    )


#### Guidelines for database design:

* Normalization is the process of creating or re-arranging data relationships so that it will be easy to store and retrieve data efficiently.  Data is normalized to achieve the following goals: 
    * Eliminate data redundancies and save space 
    * Make it easier to change data 
    * Simplify the enforcement of referential integrity constraints 
    * Produce a design that is a 'good' representation of the real world (one that is intuitively easy to understand and a good base for further growth)

    * Make it easier to change data by avoiding to provide multiple values separated by commas in a column
    * All columns in a table should depend on the primary key, all extra related information should be in other tables linked by foreign keys

https://support.microsoft.com/en-us/help/283878/description-of-the-database-normalization-basics

### CREATE TABLE  - statement
https://www.sqlitetutorial.net/sqlite-create-table/

```sql
CREATE TABLE [IF NOT EXISTS] [schema_name].table_name (
    column_1 data_type PRIMARY KEY,
    column_2 data_type NOT NULL,
    column_3 data_type DEFAULT 0,
    table_constraints
) [WITHOUT ROWID];
```

In this syntax:

* First, specify the name of the table that you want to create after the CREATE TABLE keywords. The name of the table cannot start with sqlite_ because it is reserved for the internal use of SQLite.
* Second, use `IF NOT EXISTS` option to create a new table if it does not exist. Attempting to create a table that already exists without using the IF NOT EXISTS option will result in an error.
* Third, optionally specify the schema_name to which the new table belongs. The schema can be the main database, temp database or any attached database.
* Fourth, specify the column list of the table. Each column has a name, data type, and the column constraint. SQLite supports `PRIMARY KEY, UNIQUE, NOT NULL`, and `CHECK` column constraints.
* Fifth, specify the table constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, and CHECK constraints.
* Finally, optionally use the `WITHOUT ROWID` option. By default, a row in a table has an implicit column, which is referred to as the rowid, oid or _rowid_ column. The rowid column stores a 64-bit signed integer key that uniquely identifies the row inside the table. If you don’t want SQLite creates the rowid column, you specify the WITHOUT ROWID option. A table that contains the rowid column is known as a rowid table. Note that the WITHOUT ROWID option is only available in SQLite 3.8.2 or later.

https://www.sqlite.org/syntaxdiagrams.html#create-table-stmt

<img src = "https://www.sqlite.org/images/syntax/create-table-stmt.gif" width="800"/>

Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
https://www.sqlite.org/datatype3.html
* `NULL`. The value is a NULL value.
* `INTEGER`. The value is a signed integer, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
* `REAL`. The value is a floating point value, stored as an 8-byte IEEE floating point number.
* `TEXT`. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
* `BLOB`. The value is a blob of data, stored exactly as it was input.

The `sqlite_master` has the following create statement: 
```sql
CREATE TABLE sqlite_master ( type TEXT, name TEXT, tbl_name TEXT, rootpage INTEGER, sql TEXT );
```

#### Create the table `gene` with the columns: `gene_id`, `gene_symbol`. `gene_name` 

##### The `connection` object methods can be used to save or revert/reset the changes after a command that makes changes to the database
##### `COMMIT` - save the changes 
##### `ROLLBACK` - revert the changes 


In [18]:
sql='''
CREATE TABLE IF NOT EXISTS go_bp_ALT (
      gene_go_id INTEGER PRIMARY KEY AUTOINCREMENT,
      gene_id INTEGER NOT NULL,                     -- REFERENCES  genes _id 
      go_id CHAR(10) NOT NULL,                      -- GO ID
      evidence CHAR(30) NOT NULL,                   -- GO evidence information
      FOREIGN KEY (gene_id) REFERENCES  genes  (_id)
    );
'''
try:
    cursor.execute(sql)
except connection.DatabaseError:
    print("Creating the go_bp_ALT table resulted in a database error!")
    connection.rollback()
    raise
else:
    connection.commit()
finally:
    print("done!")
    
    

done!


##### Similar error handling, as seen above, can be when executing any statement that changes the database.

##### Check if the new table appears in the `sqlite_master` table 

In [19]:
sql = '''
SELECT name
FROM sqlite_master 
WHERE name LIKE "go_bp%"
LIMIT 4;
'''
cursor.execute(sql)
print(cursor.fetchall())

[('go_bp',), ('go_bp_all',), ('go_bp_ALT',)]


  
<br><br> 
The `sqlite_sequence` table is created and initialized automatically whenever a regular table is created if it has a column with the `AUTOINCREMENT` option set.<br>
https://www.sqlite.org/autoinc.html


##### Check if the new table appears in the `sqlite_master` table 

In [21]:
sql = '''
SELECT name
FROM sqlite_master 
WHERE name LIKE "sqlite_%"
LIMIT 10;
'''
cursor.execute(sql)
print(cursor.fetchall())

[('sqlite_autoindex_metadata_1',), ('sqlite_autoindex_map_counts_1',), ('sqlite_autoindex_genes_1',), ('sqlite_autoindex_gene_info_1',), ('sqlite_autoindex_chrlengths_1',), ('sqlite_stat1',), ('sqlite_stat4',), ('sqlite_sequence',)]


### INDEXING

Indexes are lookup table, like the index of a book.
They are usually created for columns that have unique/ or less redundant values and provide a way to quicky search 
the values.<br>
Indexing creates a copy of the indexed columns together with a link to the location of the additional information.<br> 
The index data is stored in a data structure that allows for fast sorting. <br>
E.g.: balanced-tree - every leaf is at most n nodes away from the root) that allows for fast sorting. <br>
All queries (statements) regarding an indexed table are applied to the index


* One important function in Relational Databases is to be able to create indexes on columns in tables  
* These indexes are pre-calculated and stored in the database 
* Indexes should be created on columns that are used in queries and joins
    * columns that appear in conditions (WHERE, JOIN ... ON) 
* They will rapidly speed up query return rate and improve query performance

To create an index use the following command:

```sql
CREATE INDEX indexName ON tableName (columnName)
```

In [22]:
sql = '''
CREATE INDEX gene_go_idx 
ON go_bp_ALT (gene_go_id)
'''
cursor.execute(sql)
connection.commit()


##### Check if the new index appears in the `sqlite_master` table 

In [24]:
sql = '''
SELECT name
FROM sqlite_master 
WHERE type= "index";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name
sqlite_autoindex_metadata_1
sqlite_autoindex_map_counts_1
sqlite_autoindex_genes_1
sqlite_autoindex_gene_info_1
sqlite_autoindex_chrlengths_1
Fchromosomes
Faccessions
Fcytogenetic_locations
Fomim
Frefseq
Fpubmed
Funigene
Fgo_bp
Fgo_bp_go_id
Fgo_mf
Fgo_mf_go_id
Fgo_cc
Fgo_cc_go_id
Fgo_bp_all
Fgo_bp_all_go_id
Fgo_mf_all
Fgo_mf_all_go_id
Fgo_cc_all
Fgo_cc_all_go_id
Fkegg
Fec
Fchromosome_locations
Fpfam
Fprosite
Falias
Fensembl
Fensembl2ncbi
Fncbi2ensembl
Fensemblp
Fensemblt
Funiprot
Fucsc
gene_go_idx


#### Remove the index

In [25]:
sql = '''
DROP INDEX gene_go_idx 
'''
cursor.execute(sql)
connection.commit()


##### Check if the index was removed from the `sqlite_master` table 

In [26]:
sql = '''
SELECT name, sql
FROM sqlite_master 
WHERE type= "index" AND name = "gene_go_idx";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name	sql



### INSERT - statement

Makes changes to the database table<br>
Adds new data to a table (if the constraints are met)
Constraint examples: 
* For one designated column or a group of columns that are designated as Primary Key the values are unique
* The value inserted in a column that has a Foreign Key constraint should exist in the column that it refers to

```sql
INSERT INTO <tablename> (<column1>, <column2>, <column3>) VALUES (value1, value2, value3);
```

##### One simple INSERT command adds 1 row of data at a time into an existing table  

##### Connection object allows us to:
* ##### COMMIT - save the changes 
* ##### ROLLBACK - reverts/discards the changes

<br>

##### Let's see what is in the table (it should be nothing):

In [27]:
sql = '''
SELECT *
FROM go_bp_ALT;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence



<br>

##### Let's try an insert:
```sql
INSERT INTO <tablename> (<column1>, <column2>, <column3>) VALUES (value1, value2, value3);
```

In [28]:
values_list = [1234,"GO:1234","CM_EV"]

sql = '''
INSERT INTO go_bp_ALT (gene_id, go_id, evidence) 
VALUES (?,?,?);
'''
cursor.execute(sql,values_list)
connection.commit()

In [29]:
# This command retrieves the identifier of the last row from the most current query
# The gene_go_id

id_value = cursor.lastrowid
id_value

1

<br>


##### We have a row in the table!!! And the gene_go_id was automatically generated.

In [30]:
sql = '''
SELECT *
FROM go_bp_ALT ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence
1	1234	GO:1234	CM_EV


#### You can have a Python "table" structure (list of lists) of insert values and get them all inserted in one command, each sublist having the correct number of values.


In [31]:
values_tbl = [[1235,"GO:1235","CM_EV"], [1236,"GO:1236","CM_EV"], [1236,"GO:1237","CM_EV"]]

sql = '''
INSERT INTO go_bp_ALT (gene_id, go_id, evidence) 
VALUES (?,?,?);
'''
cursor.executemany(sql,values_tbl)
connection.commit()


In [32]:
sql = '''
SELECT *
FROM go_bp_ALT ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence
1	1234	GO:1234	CM_EV
2	1235	GO:1235	CM_EV
3	1236	GO:1236	CM_EV
4	1236	GO:1237	CM_EV


In [33]:
go_tbl = [["GO:1238","ND"], ["GO:1239","ND"], ["GO:1240","IDE"]]
gene_id = 4
for go_elem in go_tbl:
    go_elem.insert(0,gene_id) 
print(go_tbl)
sql = '''
INSERT INTO go_bp_ALT (gene_id, go_id, evidence) 
VALUES (?,?,?);
'''
cursor.executemany(sql,go_tbl)
connection.commit()

[[4, 'GO:1238', 'ND'], [4, 'GO:1239', 'ND'], [4, 'GO:1240', 'IDE']]


In [34]:
sql = '''
SELECT *
FROM go_bp_ALT ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence
1	1234	GO:1234	CM_EV
2	1235	GO:1235	CM_EV
3	1236	GO:1236	CM_EV
4	1236	GO:1237	CM_EV
5	4	GO:1238	ND
6	4	GO:1239	ND
7	4	GO:1240	IDE


#### UPDATE - statement - changes the table rows



Modifies data (already in a table)  in all rows matching the WHERE clause 

```sql
UPDATE table_name 
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
```

Update is generally a single row command, but use of the where clause can cause data to be updated in multiple rows <br>
(whether you intended to or not !!!!)

The following statement updates the evidence for all entries for all genes associated with the 2 biological processses 

In [36]:
sql = '''
UPDATE go_bp_ALT
SET gene_id = 5, go_id = "GO:1234" 
WHERE gene_id = 4;
'''
cursor.execute(sql)
connection.commit()

In [40]:
sql = '''
SELECT *
FROM go_bp_ALT ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence
2	1235	GO:1235	CM_EV
4	1236	GO:1237	CM_EV


#### DELETE - statement - deletes table rows

* MAKES CHANGES TO THE DATA
* Row level deletion – can’t delete less than this. 

```sql
DELETE FROM <tablename> WHERE <column> = <value>
```

* The WHERE predicate is the same as for the SELECT statement, that is, it determines which rows will be deleted  



In [38]:
sql = '''
DELETE FROM go_bp_ALT 
WHERE go_id IN ("GO:1234","GO:1236");
'''
cursor.execute(sql)
connection.commit()


In [39]:
sql = '''
SELECT *
FROM go_bp_ALT ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence
2	1235	GO:1235	CM_EV
4	1236	GO:1237	CM_EV


```sql
DELETE FROM <tablename>; 
```

* This would delete all rows of data from a table.
* Preserves table structure (table still exists)
* Optimized for speed in SQLite, no row-by-row execution.
* EXISTS <table_name> still evaluates to True


In [41]:
sql = '''
DELETE FROM go_bp_ALT;
'''
cursor.execute(sql)
connection.commit()


In [42]:
sql = '''
SELECT *
FROM go_bp_ALT ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

gene_go_id	gene_id	go_id	evidence



<br>

#### `DROP TABLE` - statement - removes a table (permanently)

In [43]:
sql = '''
DROP TABLE IF EXISTS go_bp_ALT;
'''
cursor.execute(sql)
connection.commit()

In [44]:
sql = '''
SELECT name AS "TABLE NAME"
FROM sqlite_master 
WHERE name LIKE "go_bp%";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

TABLE NAME
go_bp
go_bp_all


#### VIEW in a database

* A view is a virtual table which can be created from a query on existing tables
* Views are created to give a more human readable version of the normalized data / tables
* http://www.sqlitetutorial.net/sqlite-create-view/
* An SQLite view is read only

```sql
CREATE [TEMP] VIEW [IF NOT EXISTS] view_name(column-name-list) AS    
select-statement;
```

In [46]:
# gene go information for easy access
sql = '''
SELECT symbol, go_id, evidence
FROM gene_info AS gi
INNER JOIN go_bp AS go
ON gi._id == go._id
WHERE evidence IN ("EXP","IDA") ;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

symbol	go_id	evidence
A2M	GO:0001869	IDA
AANAT	GO:0006474	IDA
AANAT	GO:0030187	IDA
AANAT	GO:0071320	IDA
AARS	GO:0006419	IDA
ABCA1	GO:0007040	IDA
ABCA1	GO:0008203	IDA
ABCA1	GO:0016197	IDA
ABCA1	GO:0033344	IDA
ABCA1	GO:0033700	IDA
ABCA1	GO:0042632	IDA
ABCA1	GO:0045332	IDA
ABCA2	GO:0006357	IDA
ABCA4	GO:0045332	IDA
ABL1	GO:0006974	IDA
ABL1	GO:0006975	IDA
ABL1	GO:0018108	IDA
ABL1	GO:0038083	IDA
ABL1	GO:0042770	IDA
ABL1	GO:0043065	IDA
ABL1	GO:0046777	IDA
ABL1	GO:0050731	IDA
ABL1	GO:0051353	IDA
ABL1	GO:0051444	IDA
ABL1	GO:0070301	IDA
ABL1	GO:0071103	IDA
ABL1	GO:0071901	IDA
ABL1	GO:1990051	IDA
ABL1	GO:2001020	IDA
AOC1	GO:0035874	IDA
AOC1	GO:0042493	IDA
AOC1	GO:0046677	IDA
AOC1	GO:0055114	IDA
AOC1	GO:0071280	IDA
AOC1	GO:0071420	IDA
AOC1	GO:0097185	IDA
ABL2	GO:0018108	IDA
ABL2	GO:0051353	IDA
ACACB	GO:0006084	IDA
ACACB	GO:0051289	IDA
ACADM	GO:0033539	IDA
ACADM	GO:0051791	IDA
ACADM	GO:0051793	IDA
ACADM	GO:0055114	IDA
ACADS	GO:0033539	IDA
ACADVL	GO:0033539	IDA
ACAT1	GO:0006085	IDA
ACAT1	GO:0015936	

In [47]:
# gene go information for easy access
sql = '''
CREATE VIEW IF NOT EXISTS gene_go_info (symbol, go_id, evidence) AS
SELECT symbol, go_id, evidence
FROM gene_info AS gi
INNER JOIN go_bp AS go
ON gi._id == go._id
WHERE evidence IN ("EXP","IDA") ;
'''
cursor.execute(sql)
connection.commit()

In [48]:
# gene go information 
sql = '''
SELECT *
FROM gene_go_info
LIMIT 10;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

symbol	go_id	evidence
A2M	GO:0001869	IDA
AANAT	GO:0006474	IDA
AANAT	GO:0030187	IDA
AANAT	GO:0071320	IDA
AARS	GO:0006419	IDA
ABCA1	GO:0007040	IDA
ABCA1	GO:0008203	IDA
ABCA1	GO:0016197	IDA
ABCA1	GO:0033344	IDA
ABCA1	GO:0033700	IDA


In [49]:
sql = '''
SELECT type, name AS "TABLE NAME"
FROM sqlite_master 
WHERE name = "gene_go_info";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

type	TABLE NAME
view	gene_go_info


```sql
DROP VIEW [IF EXISTS] view_name;
```

In [53]:
# gene go information for easy access
sql = '''
DROP VIEW IF EXISTS gene_go_info;
'''
cursor.execute(sql)
connection.commit()

In [54]:
sql = '''
SELECT type, name AS "TABLE NAME"
FROM sqlite_master 
WHERE name = "gene_go_info";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

type	TABLE NAME



#### JOIN tables

* Multiple tables contain different data that we want to retrieve from a single query
* In order to assemble data as part of a query, a JOIN between tables is needed
* This is a very common practice, since it’s rare for all the data you want to be in a single table


* INNER JOIN - return only those rows where there is matching content in BOTH tables (is the default when JOIN is used)
* OUTER JOIN - returns all rows from both tables even if one of the tables is blank
* SELF JOIN - can be used to join a table to itself (through aliasing), to compare data internal to the table

```sql
SELECT ... FROM table1 [INNER] JOIN table2 ON conditional_expression
```


In [59]:
sql = '''
SELECT gi._id, symbol, evidence
FROM gene_info AS gi
INNER JOIN go_bp AS go
ON gi._id = go._id
LIMIT 5;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

_id	symbol	evidence
1	A1BG	TAS
1	A1BG	ND
1	A1BG	TAS
2	A2M	IDA
2	A2M	TAS


In [60]:
sql = '''
SELECT * 
FROM genes
LIMIT 5;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

_id	gene_id
1	1
2	2
3	3
4	9
5	10


In [62]:
sql = '''
SELECT sql 
FROM sqlite_master
WHERE name = "go_bp";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

sql
CREATE TABLE go_bp (
      _id INTEGER NOT NULL,                         -- REFERENCES  genes 
      go_id CHAR(10) NOT NULL,                      -- GO ID
      evidence CHAR(3) NOT NULL,                    -- GO evidence code
      FOREIGN KEY (_id) REFERENCES  genes  (_id)
    )


In [64]:
sql = '''
SELECT symbol, gene_id, count(go_id)
FROM gene_info AS gi
INNER JOIN go_bp AS go
ON gi._id = go._id
INNER JOIN genes g
ON g._id = gi._id
GROUP BY symbol, gene_id
LIMIT 5;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

symbol	gene_id	count(go_id)
A1BG	1	3
NAT2	10	1
ADA	100	42
CDH2	1000	24
AKT3	10000	13


In [67]:
sql = '''
SELECT min(go_no)
FROM
(
SELECT symbol, gene_id, count(go_id) go_no
FROM gene_info AS gi
INNER JOIN go_bp AS go
ON gi._id = go._id
INNER JOIN genes g
ON g._id = gi._id
GROUP BY symbol, gene_id
);
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

min(go_no)
1


In [None]:
# And close()

cursor.close()
connection.close() 

#### To remove the database, delete the .sqlite file.