# Data Science Intro

**Introduction to Data Science:** Data Science, Brief History of Data Science, Increasing
attention to data science, Fundamental fields of study to data science, Data science and
Related Terminologies, Types of Analytics, Application of Data Science, Data Science Process
Model

**Python environment and basics of Python:** Jupyter notebook, setting working directory in
python, variables, data types, operators, functions in python

# Introduction to Data Science

## 1. Data Science
Data Science is an interdisciplinary field that focuses on extracting knowledge and insights from structured and unstructured data. It combines elements of statistics, computer science, mathematics, and domain expertise to analyze, process, and interpret large volumes of data.

### Key Components of Data Science:
- **Data Collection**: Gathering raw data from various sources like databases, APIs, and web scraping.
- **Data Cleaning**: Removing inconsistencies, duplicates, and errors to prepare data for analysis.
- **Data Analysis**: Applying statistical methods and algorithms to understand trends and patterns.
- **Data Visualization**: Representing data insights through charts, graphs, and dashboards.
- **Model Building**: Using machine learning algorithms to predict or classify outcomes.

---


![image.png](attachment:3156f62d-1c7d-4884-9d12-de701bf46026.png)


## 2. Brief History of Data Science
- **1960s**: Emergence of statistics as a key discipline for data analysis.
- **1980s**: Introduction of database systems and the term "data mining."
- **2000s**: The term "Data Science" gained prominence as a distinct field.
- **2010s**: Explosive growth in big data, AI, and machine learning technologies.
- **Present**: Data Science is integral to various industries, including healthcare, finance, and technology.

---

## 3. Increasing Attention to Data Science
The growing importance of data science can be attributed to:
- The **massive growth of data** generated by businesses, social media, and IoT devices.
- The availability of **powerful computational resources** like GPUs and cloud computing.
- Advancements in **machine learning algorithms**.
- Widespread adoption of data-driven decision-making in industries.

---

## 4. Fundamental Fields of Study to Data Science
Data Science draws knowledge from:
- **Statistics**: For hypothesis testing, probability, and inferential analysis.
- **Computer Science**: For programming, algorithm design, and database management.
- **Mathematics**: For linear algebra, calculus, and optimization.
- **Domain Expertise**: Understanding the specific context or industry.

---

## 5. Data Science and Related Terminologies
- **Big Data**: Extremely large datasets that require specialized tools to process.
- **Machine Learning**: Algorithms that learn from data to make predictions or decisions.
- **Artificial Intelligence (AI)**: Broader field encompassing machine learning and intelligent systems.
- **Data Engineer**: Focuses on building and maintaining data infrastructure.
- **Data Analyst**: Focuses on interpreting data and generating insights.

---

## 6. Types of Analytics
1. **Descriptive Analytics**: Summarizes historical data (e.g., sales reports).
2. **Diagnostic Analytics**: Investigates why something happened (e.g., root cause analysis).
3. **Predictive Analytics**: Forecasts future outcomes (e.g., sales predictions).
4. **Prescriptive Analytics**: Recommends actions to optimize outcomes (e.g., supply chain optimization).

---

## 7. Applications of Data Science
- **Healthcare**: Predicting diseases, personalized medicine.
- **Finance**: Fraud detection, risk management.
- **Retail**: Recommendation systems, inventory optimization.
- **Transportation**: Route optimization, self-driving cars.
- **Entertainment**: Content recommendation, audience analysis.

---

## 8. Data Science Process Model
1. **Define the Problem**: Clearly state the objective.
2. **Data Collection**: Gather relevant data.
3. **Data Cleaning**: Prepare data for analysis.
4. **Data Exploration**: Use visualizations to understand data.
5. **Model Building**: Create and train machine learning models.
6. **Model Evaluation**: Assess model performance.
7. **Deployment**: Implement the model in production.
8. **Monitoring and Maintenance**: Continuously improve the model.

---

# Python Environment and Basics of Python

## 1. Jupyter Notebook
- Jupyter Notebook is an open-source interactive environment for writing and running Python code.
- **Installation**:
  ```bash
  pip install notebook
  jupyter notebook
  ```
- Features:
  - Code cells for running Python code.
  - Markdown cells for documentation.
  - Visualizations inline using libraries like Matplotlib and Seaborn.

---

# Create an environment 

python -m venv (environment_name)

- activate the environment

source (environment_name)/Scripts/activate 

- then install the packages using pip

pip freeze > requirements.txt



# Why Python?
( Python came before JAVA)

1. **Design Philosophy**
   - Simple, elegant, and easy to learn.
2. **Batteries Included**
   - Comes with built-in functionalities; no need to reinvent the wheel.
3. **General Purpose**
   - Supports various programming paradigms; versatile for multiple applications.
4. **Libraries & Community**
   - Extensive libraries and a large, supportive community.

---

## Why Python for Data Science?

1. **Ease of Learning**
   - Intuitive syntax and readability.
2. **Math-Friendly**
   - Seamless integration with mathematical operations.
3. **Community Support**
   - Robust ecosystem with tools and libraries tailored for data science.

---

## R vs Python

| Feature                | R                                     | Python                                |
|------------------------|---------------------------------------|---------------------------------------|
| **Learning Curve**     | Steeper for beginners.                | Beginner-friendly.                    |
| **Statistical Analysis**| Advanced built-in statistical tools. | Requires libraries like NumPy, SciPy. |
| **Data Visualization** | Excellent visualization packages like ggplot2. | Strong libraries like Matplotlib, Seaborn. |
| **Flexibility**         | Primarily for statistical computing.  | General-purpose, suitable for various tasks. |
| **Community**          | Smaller, academia-focused.            | Large, diverse, and industry-focused. |

---

## Python Basics

### 1. Python Output
- Python is case-sensitive (e.g., `HELLO` ≠ `hello`).


In [6]:
print("Hello world!")

Hello world!


In [7]:
print(hello) # gives error

NameError: name 'hello' is not defined

In [8]:
print(7)

7


In [9]:
print(7.7)

7.7


In [10]:
print(True)

True


In [11]:
print('Hello',1,4.5,True)

Hello 1 4.5 True


In [1]:
print('Hello','world',1,2,3.4,sep="\n",end="+++")

Hello
world
1
2
3.4+++

-----------
## 2. Data Types


In [13]:
# Integer
print(8)
# 1*10^308 # is supported
print(1e309)

8
inf


In [14]:
# Decimal/Float
print(8.55)
print(1.7e309)

8.55
inf


In [15]:
# Boolean
print(True)
print(False)

True
False


In [16]:
# Text/String
print('Hello World')

Hello World


In [17]:
# complex
print(5+6j)

(5+6j)


In [4]:
a_tup = (1,2,3)
a_tup

(1, 2, 3)

In [5]:
a_tup[1] = 9999
a_tup

TypeError: 'tuple' object does not support item assignment

In [18]:
# List-> C-> Array
print([1,2,3,4,5])

[1, 2, 3, 4, 5]


In [19]:
# Tuple
print((1,2,3,4,5))

(1, 2, 3, 4, 5)


In [6]:
# Sets
print({1,2,3,3,3,3,3,3,4,5})

{1, 2, 3, 4, 5}


In [21]:
# Dictionary
print({'name':'Nitish','gender':'Male','weight':70})

{'name': 'Nitish', 'gender': 'Male', 'weight': 70}


In [22]:
# type
type([1,2,3])

list

In [23]:
print(type(23.3))

<class 'float'>


-----------
## 3. Variable


<span style="color:red">Interview Questions: Static Vs Dynamic Typing</span>

<br>
<span style="color:red">Interview Questions: Static Vs Dynamic Binding</span>
**Static Binding**: Supported in C/C++, Java and other languages where you can not change the data type of the variable after declaring it. 

<br>
<span style="color:red">Interview Questions: Stylish declaration techniques</span>

a,b,c = 1,2,3
<br> 
a=b=c=5

In [24]:
# Static Vs Dynamic Typing
# Static Vs Dynamic Binding
# stylish declaration techniques

In [25]:
name = 'nitish'
print(name)

a = 5
b = 6

print(a + b)

nitish
11


Dynamic Binding  <br>
a=5 


Static Binding <br>
int a=5

In [26]:
# Dynamic Binding
a = 5
print(a)
a = 'nitish'
print(a)


# Static Binding
# int a=5

5
nitish


In [27]:
a=1
b=2
c=3

print(a,b,c)

1 2 3


In [28]:
a,b,c = 3,2,1
print(a,b,c)

3 2 1


In [29]:
a=b=c=5
print(a,b,c)

5 5 5


In [30]:
# this is a comment 
# second line 
a=4
b=6 # like this 
# second comment 
print(a+b)


10


In [7]:

"""
This is the syntax for multiline comments
Practice writing documentation using comments - it ies really important
"""

'\nThis is the syntax for multiline comments\nPractice writing documentation using comments - it ies really important\n'

## 4. Keywords & Identifiers 


- there are about ~35 reserved keywords in python 
- you cannot use the keywords as identifiers

In [32]:
# Keywords - reserved words with predefined meaning in python 

In [33]:
# Identifiers 
# Names given to classes, functions or variables.
# identifiers cannot be keywords

In [34]:
# Compiled vs interpreted languages

In [35]:
# Rules for Identifiers
# You can't start with a digit
name1 = 'Hello'
print(name1)
# You can use special chars -> _
_ = 'Hii'
print(_)
# identiers can not be keyword

Hello
Hii


## 5. User Input

In [8]:
# Static vs Dynamic Softwares
input('Enter Something')

Enter Something aasdfjsdfs


'aasdfjsdfs'

In [13]:
## taking input from users and storing it in a variable 
x = input('Enter a number here')
fnum = int(input('enter first number'))
snum = int(input('enter second number'))

print('The type of x is ',type(x)) # string even if you have entered a number 

# add 2 vars and store it in sum 
result = fnum + snum
print('The sum is ',result)
print(type(fnum),type(snum),type(result))

Enter a number here 2.2
enter first number 2.2


ValueError: invalid literal for int() with base 10: '2.2'

In [15]:
type(type(12))

type

In [39]:
# Why python stores the input in string format 
# because the python considers that the string to be universal data format

## 6. Type Conversion


In [40]:
# Implicit Vs Explicit
print(5+5.6)
print(type(5),type(5.6))

print(4 + '4')

10.6
<class 'int'> <class 'float'>


TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [41]:
# Explicit
# str -> int
#int(4+5j)

# int to str
str(5)

# float
float(4)

4.0

## 7. Literals


In [16]:
a = 0b1010 #Binary Literals
b = 100 #Decimal Literal 
c = 0o310 #Octal Literal
d = 0x12c #Hexadecimal Literal

#Float Literal
float_1 = 10.5 
float_2 = 1.5e2 # 1.5 * 10^2
float_3 = 1.5e-3 # 1.5 * 10^-3

#Complex Literal 
x = 3.14j

print(a, b, c, d)
print(float_1, float_2,float_3)
print(x, x.imag, x.real)

10 100 200 300
10.5 150.0 0.0015
3.14j 3.14 0.0


In [18]:
# Complex
x = 3.14j
print(x.imag)

3.14


In [19]:
string = 'This is Python'
strings = "This is Python"
char = "C"
multiline_str = """This is a


multiline string with more

than one line code."""
unicode = u"\U0001f600\U0001F606\U0001F923"
raw_str = r"raw \n string"

print(string)
print(strings)
print(char)
print(multiline_str)
print(unicode)
print(raw_str)

This is Python
This is Python
C
This is a


multiline string with more

than one line code.
😀😆🤣
raw \n string


In [22]:
# float > int > bool 

In [23]:
a = True + 4 + 0.0
b = False + 10

print("a:", a)
print("b:", b)

a: 5.0
b: 10


In [46]:
# predefining a variable

k = None
a = 5
b = 6
print('Program exe')

Program exe


## Operators in Python 

- Airthmetic 
- Relational 
- Logical 
- Bitwise
- Assignment
- Membership

In [48]:
# Arithemtic Operators 
print(5+6)

print(5-6)

print(5*6)

print(5/6)

print(5//2) # Integer division

print(5%2)

print(5**2)
print(4**0.5)

11
-1
30
0.8333333333333334
2
1
25
2.0


In [49]:
# Relational Operators 
print(4>5)
print(4<5)
print(4<=4)
print(4>=4)
print(4==4)
print(4!=4)


False
True
True
True
True
False


In [50]:
# Logical operators 

print(0 and 0)
print(1 and 1)
print(1 and 0)
print(0 and 1)

print(0 or 0)
print(1 or 1)
print(1 or 0)
print(0 or 1)

print(not 1)
print(not 0)

0
1
0
0
0
1
1
1
False
True


In [51]:
# Bitwise Operators

# bitwise and 
print(2&3)

# bitwise or
print(300 | 2)

#bitwise xor
print(2^3)

#bitwise not 
print(~2)

#leftshift and right shift
print(4>>2)
print(4<<2)

2
302
1
-3
1
16


In [52]:
# Assignment Operators 

#= 
a=2
a+=2 # a = a+2

print(a)

4


In [53]:
a-=2
a*=2
a/=2
a%=2
a//=2

In [54]:
a = 17
a//=3
a

5

In [55]:
# Membership operators 

# in/not in 

print('D' in 'Delhi')
print('d' in 'Delhi')

print('1' not in 'Delhi')
print('1' in 'Delhi')

True
False
True
False


In [56]:
print( 1 in [2,3,4,4])

False


In [24]:
# program find the sum of a 3 digit number entered by the user 

number = int(input('Enter a 3 digit number '))
print(number )

Enter a 3 digit number  345


345


In [26]:
27//2

13

In [25]:
a = number %10
number//=10
b = number%10 
number//=10
c = number %10

print(a,b,c)
print(a+b+c)

5 4 3
12


## If-else in Python

In [59]:
# login program and indentation 

dbemail = 'one@one.com'
dbpass = '1234'

email = input('enter email')
password = input('Enter the password ')

if(email == dbemail and dbpass == password):
    print("Login successful")
elif (email == dbemail and dbpass != password):
    print("Incorrect Password")
    password = input("Enter the password")
    if password ==dbpass:
        print("Finally Welcome ")
    else:
        print("Wrong again account is locked")
else:
    print("Wrong credentials")


enter email one@one.com
Enter the password  1234


Login successful


In [60]:
# if-else examples 
# 1. Find the min of 3 given numbers 
# 2. Menu driven program 

In [61]:
# 1
# min of 3 numbers 

a = 31
b = 39
c = 33

if a>b and a>c:
    print(a)
elif  b>c:
    print(b)
else:
    print(c)

39


In [62]:
# menu driven Calculator 

fnum = int(input('Enter the first number '))
snum = int(input('Enter the second number '))

op = input('Enter the operation ')

if op == '+':
    print(fnum+snum)
elif op == '-':
    print(fnum-snum)
elif op == '*':
    print(fnum*snum)
elif op == '/':
    print(fnum/snum)
else:
    print("invalid operator")


Enter the first number  12
Enter the second number  32
Enter the operation  +


44


In [63]:
# menu driven banking software program 

## Modules in Python 

In [64]:
# math 
import math 

math.sqrt(25)

5.0

In [65]:
sqrt = math.sqrt 
sqrt(44)

6.6332495807108

In [66]:
math.factorial(5)

120

In [67]:
math.floor(6.8)

6

In [68]:
math.ceil(6.5)

7

In [69]:
# keyword 

import keyword

print(keyword.kwlist)
print(len(keyword.kwlist))

['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']
35


In [30]:
a = [1,2,3,4,6]
len(a),a

(5, [1, 2, 3, 4, 6])

In [70]:
# random 
import random 
print(random.randint(1,100))

38


In [31]:
# date time 
import datetime

print(datetime.datetime.now())

2024-12-25 22:12:35.633616


In [32]:
# all built in modules 
help('modules')


Please wait a moment while I gather a list of all available modules...

test_sqlite3: testing with SQLite version 3.43.1





-----

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting 

-----

IPython             cgitb               markdown            stack_data
PIL                 chardet             markdown_it         start_pythonwin
__future__          charset_normalizer  markupsafe          stat
__hello__           chunk               marshal             statistics
__phello__          clang               math                string
_abc                click               matplotlib          stringprep
_aix_support        clint               matplotlib_inline   struct
_argon2_cffi_bindings cmath               mdurl               subprocess
_ast                cmd                 mimetypes           sunau
_asyncio            code                mistune             symtable
_

## Loops in Python 

- Need for loops 
- While loop
- For Loop

In [74]:
# While loop example - > program to print the table 
# program : sum of all digits of a given number 
# program : keep accepting numbers from users till he 
#she enters a 0 and then find the avg

In [75]:
number = int(input("Enter the number "))
i =1

while i<11:
    print(number, ' X ',i,' = ',number*i)
    i+=1

Enter the number  122


122  X  1  =  122
122  X  2  =  244
122  X  3  =  366
122  X  4  =  488
122  X  5  =  610
122  X  6  =  732
122  X  7  =  854
122  X  8  =  976
122  X  9  =  1098
122  X  10  =  1220


## Loop else

In [33]:
# while loop with else 
x = 1
while x<3:
    print(x)
    x +=1
else:
    print('limit crossed')
    
# else is executed after the loop gets completed 

1
2
limit crossed


In [34]:
# Guessing game 

# generate a random integer between 1 and 100 

import random 
jackpot = random.randint(1,100)
attempts = 1
guess= int(input('guess karo'))

while(guess != jackpot):
#     break if break is encountered then loop else will not be run !
    if guess==jackpot:
        print(' Hooray !')
    elif(guess < jackpot):
        print(' wrong, guess higher ')
    else:
        print('wrong, guess lower')
        
    guess = int(input('guess karo'))
    attempts+=1
    
else:
    print(" Correct guess ! you took ",attempts," attempts ")

guess karo 50


wrong, guess lower


guess karo 25


 wrong, guess higher 


guess karo 38


 wrong, guess higher 


guess karo 45


wrong, guess lower


guess karo 41


 wrong, guess higher 


guess karo 43


 Correct guess ! you took  6  attempts 


## For loops

In [78]:
range(1,10)

range(1, 10)

In [79]:
list(range(1,10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [80]:
for i in range(1,11,2): # start, end - exclusive, step
    print(i)

1
3
5
7
9


In [81]:
for i in range(10,0,-1):
    print(i)

10
9
8
7
6
5
4
3
2
1


In [82]:
list(range(10,0,-1))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Program - The current population of a town is 10000. The population of the town is increasing at the rate of 10% per year. You have to write a program to find out the population at the end of each of the last 10 years.

x+10% of x = 10000

x+0.1x = 10000

1.1x = 10000

x = 10000/1.1

In [85]:
curr_pop  = 10000
for i in range(10,0,-1):
    print(i,curr_pop)
    curr_pop = curr_pop/1.1

10 10000
9 9090.90909090909
8 8264.462809917353
7 7513.148009015775
6 6830.134553650703
5 6209.213230591548
4 5644.739300537771
3 5131.5811823070635
2 4665.07380209733
1 4240.976183724845
