# Data and Database

In data science, getting data is the major component of the process after formulating the problem at hand. Data can be collected from different sources in different format. For the purpose of this notebook, we can categorize the sources (based on formats) into:

* Text Files: Text files can be in different formats (both online or saved in a local drive). The most common formats are Excel,  csv, txt, pdf, etc...
* Webscraping: Python has tools to access and extract data from websites. Beautiful Soup is one of the well known libraries to for pulling data out of HTML and XML files.
* Application Programming Interface (APIs): API is a software intermediary that allows two applications to talk to each other. 
* Database: There are several database types such as Relational, NoSQL, Hierarchical, Network, and Object-Oriented databases. This notebook will focus on relational database.

To pull of data from a database, Python has libraries such as pyodbc and SQLalchemy. The following is an example of pulling data from Microsoft SQL Server DB with pyodbc and SQLalchemy in conjunction with pandas library.


In [None]:
import pandas as pd
import pyodbc
import sqlalchemy as sal

In [None]:
# Reading data from SQL Server in local machine
conn = pyodbc.connect("Driver={SQL Server};"
                      "Server=DESKTOP-MS8S2RN;"
                      "Database=NCSES;"
                      "Trusted_Connection=yes;"
)   

cursor = conn.cursor()

df = pd.read_sql_query('SELECT * from dbo.SED1',conn)
df = df.set_index(['S&E Fields', 'Broad Fields', 'Detailed Fields'])
df.head()


In [None]:
server_name = 'DESKTOP-MS8S2RN'
database_name = 'NCSES'
engine = sal.create_engine('mssql+pyodbc://server_name/database_name?driver=SQL Server?Trusted_Connection=yes')



# establishing the connection to the databse using engine as an interface
conn = engine.connect()


# df = pd.read_sql(query, conn)
# df.head()

# printing names of the tables present in the database
#print(engine.table_names())

In [None]:
engine