<div style="text-align:center"><span style="font-size:2em; font-weight: bold;">Lecture 2—Scripting</span></div>

## Programming: Libraries

Source: **Clean Python: Elegant Coding in Python** Sunil Kapil Apress 2019 (available through O'Reilly Safari)

**collections**

This is one of the most widely used libraries and has useful data structures, specifically namedtuple, defaultdict, and orderddict.

**csv**

Use csv for reading and writing CSV files. It will save you lot of time instead of writing your own methods while reading files.

**datetime and time**

These are without a doubt two of the most used libraries. In fact, you have probably already encountered them. If not, getting familiar with the different methods available in these libraries is beneficial in different scenarios, especially when you are working with timing issues.

~~**math**~~

~~The math lib has lots of useful methods to perform basic to advanced math computations. Before looking for a third-party library to solve math problems, try to see whether this library already has them.~~

**re**

There is no substitute for this library that can solve problems using regular expressions. In fact, re is one of the best libraries in the Python language. If you know regular expressions well, you can create magic using the re library. It gives you the power to perform some of the more difficult operations easily using regular expressions.

~~**tempfile**~~

~~Consider this a one-off library to create temporary files. It’s a good built-in library.~~

**itertools**

Some of the most useful tools in this library are permutations and combinations. However, if you explore it more, you will find that you can solve a lot of computation problems using itertools . It has some of the useful functions such as dropwhile, product, chain, and islice.

**functools**

If you are developer who loves functional programming, this library is for you. It has lots of functions that will help you to think of your code in a more functional way. One of the most used partials is in this library.

**sys** and **os**

Use these libraries when you want to perform any specific system- or OS-level operations. sys and os give you the power to do a lot of amazing things with your system.

**subprocess**

This library helps you to create multiple processes on your system without much effort. The library is easy to use, and it creates multiple processes and handles them using multiple methods.

~~*logging**~~

~~No big project could be successful without a good logging feature. The logging library from Python helps you to easily add logging in your system. It has different ways to spit out logs such as the console, files, and the network.~~

**json**

JSON is the de facto standard for passing information over a network and for APIs. The json library from Python does a great job of handling different scenarios. The json library interface is easy to use, and the documentation is pretty good.

**pickle**

You might not use it in daily coding, but whenever you need to serialize and deserialize a Python object, there is no better library than pickle .

~~**\_\_future\_\_**~~

~~This is a pseudomodule that enables new language features that are not compatible with the current interpreter. So, you might want to consider using them in your code where you want to use a future version.~~

### Example: Using csv

In [1]:
cd

C:\Users\shaur


In [2]:
cd C:/Users/shaur/OneDrive/Desktop/UTD Work/Sem 3/PDS/jupyter notebooks

C:\Users\shaur\OneDrive\Desktop\UTD Work\Sem 3\PDS\jupyter notebooks


In [3]:
#import csv # give us access to csv functions
from csv import reader

file = open('BWGHT.csv')  # open a file BWGHT.csv
readr = reader(file) # make a reader for the file using a custom function from the csv package
for i in range(5):        # we are going to read 5 lines
    print(next(readr))   # next() moves through the file and returns a list of each item in the line
file.close()              # close the file when done

['faminc', 'cigtax', 'cigprice', 'bwght', 'fatheduc', 'motheduc', 'parity', 'male', 'white', 'cigs', 'lbwght', 'bwghtlbs', 'packs', 'lfaminc']
['13.5', '16.5', '122.3', '109', '12', '12', '1', '1', '1', '0', '4.691348', '6.8125', '0', '2.60269']
['7.5', '16.5', '122.3', '133', '6', '12', '2', '1', '0', '0', '4.890349', '8.3125', '0', '2.014903']
['.5', '16.5', '122.3', '129', '', '12', '2', '0', '0', '0', '4.859812', '8.0625', '0', '-.6931472']
['15.5', '16.5', '122.3', '126', '12', '12', '2', '1', '0', '0', '4.836282', '7.875', '0', '2.74084']


Now that we have called import, we have access to those functions in this notebook. This persists between cells. Equivalent to above we could have written:

In [4]:
import csv 

# with command as variable_name:  
with open('BWGHT.csv') as filename:  # this will close the file automatically preventing potential errors
    reader = csv.reader(filename)
    for i in range(5):
        print(next(reader)) 
print('the file is closed')

['faminc', 'cigtax', 'cigprice', 'bwght', 'fatheduc', 'motheduc', 'parity', 'male', 'white', 'cigs', 'lbwght', 'bwghtlbs', 'packs', 'lfaminc']
['13.5', '16.5', '122.3', '109', '12', '12', '1', '1', '1', '0', '4.691348', '6.8125', '0', '2.60269']
['7.5', '16.5', '122.3', '133', '6', '12', '2', '1', '0', '0', '4.890349', '8.3125', '0', '2.014903']
['.5', '16.5', '122.3', '129', '', '12', '2', '0', '0', '0', '4.859812', '8.0625', '0', '-.6931472']
['15.5', '16.5', '122.3', '126', '12', '12', '2', '1', '0', '0', '4.836282', '7.875', '0', '2.74084']
the file is closed


In [8]:
with open('BWGHT.csv') as file: 
    for i in range(5):
        line = file.readline()
        print(line.strip())

faminc,cigtax,cigprice,bwght,fatheduc,motheduc,parity,male,white,cigs,lbwght,bwghtlbs,packs,lfaminc
13.5,16.5,122.3,109,12,12,1,1,1,0,4.691348,6.8125,0,2.60269
7.5,16.5,122.3,133,6,12,2,1,0,0,4.890349,8.3125,0,2.014903
.5,16.5,122.3,129,,12,2,0,0,0,4.859812,8.0625,0,-.6931472
15.5,16.5,122.3,126,12,12,2,1,0,0,4.836282,7.875,0,2.74084


## Miscellaneous: bash on Windows

Windows does not have access to bash scripting capability natively. That said, we have some alternatives/ options for getting bash functionality.

1. powershell
2. Docker or virtual machines
3. ubuntu on Windows
4. Cygwin  (be sure to get cygrunsrv sed/gawk ssmtp ssh) (be sure to skip python and anything you can install without cygwin)
5. Git bash

## Topic: Scripting
### Concepts
#### Foundations
##### Idea: Simple general purpose scripts chained together to make complex programs

##### Benefits (if you don't see the benefits immediately, don't worry; that's normal)
* Run programs on a fixed schedule
* Simple multithreading for better performance time
* Code stability/persistence
    
##### Tools	
**ls** lists files in a directory structure.

**grep** looks through standard input for a particular regular expression match. If found anywhere in the line, that line is then sent to standard output (otherwise the line is caught and doesn't go anywhere).

**sed/awk** are simple data tools to read particular parts of output or writing basic conditional logic.

**cron** can be used to run jobs on a fixed schedule.

**python** can be used to write simple or complex programs for use in/with these basic pipe command. In fact, all of these simple programs can be written in python. If you can leverage these basic tools, you can make your python code much more effective and built for much more general purposes.
#### Python libraries

**Sys** is a good library for dealing with standard input, standard output, stardard error output and commandline arguments.

**os** is a good library for reading directory structures and system structure/features.

**subprocess** can be used to spawn subprocesses to implement other programs and/or other python programs to create multithreading (parallelism)

In [5]:
import sys

sys.stdin # this is the code from standard input
sys.argv # these are the arguments that were called 
print('hello') # this goes to standard output
raise Exception('The code broke') # this goes to standard error

hello


Exception: The code broke

### Example: Poker hands simulation

For the task of writing a program to count the frequencies of poker hands, I wrote the following code:

In [6]:
from itertools import product
import random

def get_hand_name(suits:dict,nums:dict):
    suitvec =   ['\u2660','\u2665','\u2663','\u2666']
    numbervec = ['A','2','3','4','5','6','7','8','9','10','J','Q','K','A']
    straights = [set(numbervec[i:i+5]) for i in range(10)]
    nvals = list(nums.values())
    nvals.sort()
    if set(nums.keys()) in straights and len(suits)==1:
        if set(nums.keys()) == straights[-1]: return 'royal'
        return 'stflush'
    if nvals[-1] >= 4: return 'quad'
    if nvals[-1] == 3 and nvals[-2] == 2: return 'fullhouse'
    if len(suits)==1: return 'flush'
    if set(nums.keys()) in straights: return 'straight'
    if nvals[-1] >= 3: return 'trip'
    if nvals[-1] >= 2 and nvals[-2] >= 2: return 'twopair'
    if nvals[-1] >= 2: return 'pair'
    return 'other'
def get_suits_and_nums(hand):
    suits = {}
    nums = {}
    for card in hand:
        suit = card[0]
        num = card[1:]
        suits[suit] = 1+suits[suit] if suit in suits.keys() else 1
        nums[num] = 1+nums[num] if num in nums.keys() else 1
    return suits,nums
def run_poker(nsim:int,seed:int):
    suitvec = ['\u2660','\u2665','\u2663','\u2666']
    numbervec = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']
    deck = [i+j for i,j in product(suitvec,numbervec)]
    hands = {'royal':0,'stflush':0,'quad':0,'fullhouse':0,'flush':0,\
             'straight':0,'trip':0,'twopair':0,'pair':0,'other':0}
    random.seed(seed)
    for i in range(nsim):
        random.shuffle(deck)
        hand = deck[:5]
        suits,nums = get_suits_and_nums(hand)
        name = get_hand_name(suits,nums)
        hands[name] += 1
    return hands

%timeit print(run_poker(int(1e6),0))

{'royal': 1, 'stflush': 19, 'quad': 254, 'fullhouse': 1426, 'flush': 1968, 'straight': 3879, 'trip': 21208, 'twopair': 47470, 'pair': 422593, 'other': 501182}
{'royal': 1, 'stflush': 19, 'quad': 254, 'fullhouse': 1426, 'flush': 1968, 'straight': 3879, 'trip': 21208, 'twopair': 47470, 'pair': 422593, 'other': 501182}
{'royal': 1, 'stflush': 19, 'quad': 254, 'fullhouse': 1426, 'flush': 1968, 'straight': 3879, 'trip': 21208, 'twopair': 47470, 'pair': 422593, 'other': 501182}
{'royal': 1, 'stflush': 19, 'quad': 254, 'fullhouse': 1426, 'flush': 1968, 'straight': 3879, 'trip': 21208, 'twopair': 47470, 'pair': 422593, 'other': 501182}
{'royal': 1, 'stflush': 19, 'quad': 254, 'fullhouse': 1426, 'flush': 1968, 'straight': 3879, 'trip': 21208, 'twopair': 47470, 'pair': 422593, 'other': 501182}
{'royal': 1, 'stflush': 19, 'quad': 254, 'fullhouse': 1426, 'flush': 1968, 'straight': 3879, 'trip': 21208, 'twopair': 47470, 'pair': 422593, 'other': 501182}
{'royal': 1, 'stflush': 19, 'quad': 254, 'full

In [7]:
from itertools import product

suitvec = ['\u2660','\u2665','\u2663','\u2666']
numbervec = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']
for i,j in product(suitvec,numbervec):
    print(i+j)

♠A
♠2
♠3
♠4
♠5
♠6
♠7
♠8
♠9
♠10
♠J
♠Q
♠K
♥A
♥2
♥3
♥4
♥5
♥6
♥7
♥8
♥9
♥10
♥J
♥Q
♥K
♣A
♣2
♣3
♣4
♣5
♣6
♣7
♣8
♣9
♣10
♣J
♣Q
♣K
♦A
♦2
♦3
♦4
♦5
♦6
♦7
♦8
♦9
♦10
♦J
♦Q
♦K


In [8]:
x = 3
print(7*x)

21


In [6]:
%cd ~/Downloads

C:\Users\jason\Downloads


In [18]:
suitvec = ['\u2660','\u2665','\u2663','\u2666']
suitvec

['♠', '♥', '♣', '♦']

In [8]:
import itertools
suitvec = ['\u2660','\u2665','\u2663','\u2666']
numbervec = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']

print(list(itertools.product(suitvec,numbervec)))

[('♠', 'A'), ('♠', '2'), ('♠', '3'), ('♠', '4'), ('♠', '5'), ('♠', '6'), ('♠', '7'), ('♠', '8'), ('♠', '9'), ('♠', '10'), ('♠', 'J'), ('♠', 'Q'), ('♠', 'K'), ('♥', 'A'), ('♥', '2'), ('♥', '3'), ('♥', '4'), ('♥', '5'), ('♥', '6'), ('♥', '7'), ('♥', '8'), ('♥', '9'), ('♥', '10'), ('♥', 'J'), ('♥', 'Q'), ('♥', 'K'), ('♣', 'A'), ('♣', '2'), ('♣', '3'), ('♣', '4'), ('♣', '5'), ('♣', '6'), ('♣', '7'), ('♣', '8'), ('♣', '9'), ('♣', '10'), ('♣', 'J'), ('♣', 'Q'), ('♣', 'K'), ('♦', 'A'), ('♦', '2'), ('♦', '3'), ('♦', '4'), ('♦', '5'), ('♦', '6'), ('♦', '7'), ('♦', '8'), ('♦', '9'), ('♦', '10'), ('♦', 'J'), ('♦', 'Q'), ('♦', 'K')]


Next I considered comparing this to multithreaded code. So I wrote this to a file. Note the use of pickle to store the information and sys to read commandline arguments:

In [10]:
%cd C:/Users/jason/Dropbox/utdallas/buan6340/post

C:\Users\jason\Dropbox\utdallas\buan6340\post


In [10]:
dump_string = '''
from itertools import product
import random
import sys
import pickle

nsim = int(sys.argv[1])
seed = int(sys.argv[2])

def get_hand_name(suits:dict,nums:dict):
    suitvec =   ['\\u2660','\\u2665','\\u2663','\\u2666']
    numbervec = ['A','2','3','4','5','6','7','8','9','10','J','Q','K','A']
    straights = [set(numbervec[i:i+5]) for i in range(10)]
    nvals = list(nums.values())
    nvals.sort()
    if set(nums.keys()) in straights and len(suits)==1:
        if set(nums.keys()) == straights[-1]: return 'royal'
        return 'stflush'
    if nvals[-1] >= 4: return 'quad'
    if nvals[-1] == 3 and nvals[-2] == 2: return 'fullhouse'
    if len(suits)==1: return 'flush'
    if set(nums.keys()) in straights: return 'straight'
    if nvals[-1] >= 3: return 'trip'
    if nvals[-1] >= 2 and nvals[-2] >= 2: return 'twopair'
    if nvals[-1] >= 2: return 'pair'
    return 'other'
def get_suits_and_nums(hand):
    suits = {}
    nums = {}
    for card in hand:
        suit = card[0]
        num = card[1:]
        suits[suit] = 1+suits[suit] if suit in suits.keys() else 1
        nums[num] = 1+nums[num] if num in nums.keys() else 1
    return suits,nums
def run_poker(nsim:int,seed:int):
    suitvec = ['\\u2660','\\u2665','\\u2663','\\u2666']
    numbervec = ['A','2','3','4','5','6','7','8','9','10','J','Q','K']
    deck = [i+j for i,j in product(suitvec,numbervec)]
    hands = {'royal':0,'stflush':0,'quad':0,'fullhouse':0,'flush':0,\
             'straight':0,'trip':0,'twopair':0,'pair':0,'other':0}
    random.seed(seed)
    for i in range(nsim):
        random.shuffle(deck)
        hand = deck[:5]
        suits,nums = get_suits_and_nums(hand)
        name = get_hand_name(suits,nums)
        hands[name] += 1
    return hands

with open('saves/poker_{0}_{1}.p'.format(nsim,seed),'wb') as pfile:
    pickle.dump(run_poker(nsim,seed),pfile)
'''
with open('poker.py','w') as file:
    file.write(dump_string)

In [11]:
nsim = 1000
seed = 7
'saves/poker_{0}_{1}.p'.format(nsim,seed)

'saves/poker_1000_7.p'

In [9]:
cd ~/Dropbox/utdallas/buan6340/post

C:\Users\jason\Dropbox\utdallas\buan6340\post


Now, I wrote a wrapper using subprocess:

In [12]:
import subprocess
import pickle

def run_parallel_poker(num_procs,num_sims_each):
    procvec = []
    for i in range(num_procs):
        item = subprocess.Popen(['python','poker.py',str(num_sims_each),str(i)])
        procvec += [item]
    for item in procvec:
        item.wait()
def collect_parallel_poker(num_procs,num_sims_each):
    total = {}
    for i in range(num_procs):
        pfile = open('saves/poker_{0}_{1}.p'.format(num_sims_each,i),'rb')
        save = pickle.load(pfile)
        total = {k:save[k]+total.get(k,0) for k in save.keys()}
        pfile.close()
    return total
def parallel_poker(num_procs,num_sims_each):
    run_parallel_poker(num_procs,num_sims_each)
    return collect_parallel_poker(num_procs,num_sims_each)

%timeit print(parallel_poker(100,int(1e6/100)))

FileNotFoundError: [Errno 2] No such file or directory: 'saves/poker_10000_0.p'

In [30]:
35/13.5

2.5925925925925926

In [26]:
import subprocess

In [27]:
proc = subprocess.Popen(['python','poker.py',"10000","0"],
                        stderr=subprocess.PIPE)
proc.stderr.read()


b''

In [28]:
proc = subprocess.Popen(['python','poker.py',"10000","0"],
                        stdout=subprocess.PIPE)
proc.stdout.read()

b''

What I got was a $16\times$ improvement running on a system with 16 cores (32 threads)

## Data science: Matrix algebra

#### Notation
$$\mathbf{X}=
\begin{bmatrix}
x_{1,1} & x_{1,2} & \dots  & x_{1,r} \\
x_{2,1} & x_{2,2} &        & x_{2,r} \\
\vdots  &         & \ddots & \vdots  \\
x_{n,1} & x_{n,2} & \dots  & x_{n,r} \\
\end{bmatrix}=\left(x_{i,j}\right)_{i=1,..n;j=1,..r}
$$

### Arithmetic
#### Basic operations
**Addition** on matrices works the same way as addition on vectors: $\mathbf{X}+\mathbf{Y}=\left(x_{i,j}+y_{i,j}\right)_{i=1,..n;j=1,..r}$ Similarly, **scalar multiplication** works the same: $a\mathbf{X}=\left(a\cdot x_{i,j}\right)_{i=1,..n;j=1,..r}$. The only special simple operation you need to know is the **transpose**. The transpose switches the rows and the columns of a matrix: $\mathbf{X}'=\left(x_{i,j}\right)_{j=1,..r;i=1,..n}$
#### Multiplication	
In general, a linear system is one where:
$$\mathbf{A}x=y$$
For ordinary least squares, we define the following linear system:
$$y=\mathbf{X}\beta+e$$
Matrix multiplication is defined so that this equation is equivalent to:
$$y_i = \beta_0+\beta_1 x_{i,1}+\dots+\beta_r x_{i,r}+e_i$$
For this to be true,
$$\mathbf{X}\mathbf{Y}=\left(\sum_{k=1}^r x_{i,k}y_{k,j}\right)_{i=1,..,n;j=1,..,m}$$
You don't need to memorize this. The computer will calculate it for you. You need to be able to read equations like $y=\mathbf{X}\beta+e$ and understand them. Note that we can now revise our inner product definition to:
$\langle x,y \rangle = x'y$
### Special quantities
#### Special matrices	
The $n\times r$ **zero matrix** is defined by: $\mathbf{0}=(0)_{i=1,..n;j=1,..r}$ Note that for any matrix $\mathbf{X}$:

$$\mathbf{0} \mathbf{X} =  \mathbf{X} \mathbf{0} = \mathbf{0}$$

The $n\times r$ **ones matrix** is defined by: $\mathbf{1}=(1)_{i=1,..n;j=1,..r}$

The $n\times n$ **identity matrix** is defined by:
$$
\mathbf{I}_n=
\begin{bmatrix}
1 & 0 & \dots & 0 \\
0 & 1 & & 0 \\
\vdots & & \ddots & \vdots \\
0 & 0 & \dots& 1
\end{bmatrix}
$$
Note that for any $n\times r$ matrix $\mathbf{X}$,
$$\mathbf{I}_n \mathbf{X} =  \mathbf{X} \mathbf{I}_r = \mathbf{X}$$

The **diagonal matrix** of a vector $x$ is defined by:
$$
\text{diag}\left(x\right)=
\begin{bmatrix}
x_1 & 0 & \dots & 0 \\
0 & x_2 & & 0 \\
\vdots & & \ddots & \vdots \\
0 & 0 & \dots& x_n
\end{bmatrix}
$$

A square ($n\times n$) matrix $\mathbf{X}$ is said to be **symmetric** if and only if:

$$\mathbf{X} \text{ symmetric} \Longleftrightarrow \mathbf{X}=\mathbf{X}'$$
#### Scalar quantites	

$a'x$ is a **linear form** of the vector $x$

$x'\mathbf{A}x$ is a **quadratic form** of the vector $x$
#### Measures
The **trace** of a matrix $\mathbf{X}$ is defined as the sum of the elements on the main diagonal:
$\text{tr}(\mathbf X)=\sum_{i=1}^r x_{i,i}$. The **determinant** is an incredibly confusing concept. It's something like a matrix norm, but the concept is confusing even to experts. The determinant of a square matrix $\mathbf X$ is defined by: 
$$
\text{det}(\mathbf X)=
\sum_{\sigma\in S_n}\left(\text{sgn}(\sigma)\prod_{i=1}^n x_{i,\sigma_i}\right)
$$
where $S_n$ is the set of a permutations of the numbers $1,\dots,n$. This is ridiculously obtuse, and I am only putting it here for reference, not for your comprehension. Actual matrix norms exist. The **$L_p$-norm** of a matrix is defined as
$$\max_a \left\Vert \mathbf Xa\right\Vert_p\text{ subject to }\left\Vert a\right\Vert_p=1$$ 

### Matrix algebra
#### Solving systems

When it exists, the **inverse** of a square ($n\times n$) matrix $\mathbf X$ is defined such that:
$$\mathbf X\mathbf X^{-1}=\mathbf X^{-1}\mathbf X=\mathbf I_n$$
This means that the inverse cancels with the matrix so that they both together become an identity matrix. When an inverse exists, we can solve linear systems such as $\mathbf Ax=y$ for $x$. 
$$\begin{align}
\mathbf Ax &=y \\
\mathbf A^{-1}\mathbf Ax &=\mathbf A^{-1}y\\
x &=\mathbf A^{-1}y
\end{align}$$
This is why we need inverses. The question then becomes, under what conditions does the inverse exist? The answer confusingly is that it exists so long as the determinant of the matrix is not equal to zero. If $\text{det}(\mathbf A)\neq 0$ and $\mathbf A$ is square, then $A^{-1}$ is guaranteed to exist. This condition is the matrix equivalent of not dividing by zero.

Using the inverse, the ordinary least squares (OLS) estimator for $\beta$ in the system $y=\mathbf X\beta+e$:
$$\hat\beta=\left(\mathbf X'\mathbf X\right)^{-1}\mathbf X'y$$

#### Miscellaneous
Sometimes when working with matrices it is necessary to consider **partition matrices** which is a way of representing a matrix using other smaller matrices. Consider the following $n\times r$ matrix $\mathbf X$:
$$\mathbf X = 
\begin{bmatrix}
\mathbf X_{1,1} & \mathbf X_{1,2} \\
\mathbf X_{2,1} & \mathbf X_{2,2} 
\end{bmatrix}
$$
where to make the dimensions match $\mathbf X_{i,j}$ is $n_i\times r_j$ where $n_1+n_2=n$ and $r_1+r_2=r$. If we transpose $\mathbf X$, then consider what has to happen to the submatrices given that the dimensions still have to match:
$$\mathbf X' = 
\begin{bmatrix}
\mathbf X_{1,1}' & \mathbf X_{2,1}' \\
\mathbf X_{1,2}' & \mathbf X_{2,2}' 
\end{bmatrix}
$$

The **Hadamard product** of two matrices is just the **elementwise product** of those matrices: $\mathbf X\circ\mathbf Y=\left(x_{i,j}y_{i,j}\right)_{i=1,..n;j=1,..r}$. 

On the other hand, the **Kronecker product** is defined as $\mathbf X\otimes\mathbf Y=(x_{i,j}\mathbf Y)_{i=1,..n;j=1,..r}$. This means that we have an entire copy of $\mathbf Y$ for every element of $\mathbf X$. This implies that if $\mathbf X$ is $n\times r$ and $\mathbf Y$ is $m\times s$, then $\mathbf X\otimes\mathbf Y$ is $nm\times rs$. 
    

## Programming challenges
### grep

Write an implementation of the grep script in python. 

* Command line argument: regular expression
* Standard input: input to be searched for the regular expression
* Standard output: any line from standard input that has the regular expression


In [7]:
cd C:/Users/shaur/OneDrive/Desktop/UTD Work/Sem 3/PDS/Test

C:\Users\shaur\OneDrive\Desktop\UTD Work\Sem 3\PDS\Test


In [8]:
import re
file = open("test.txt","w")
file.write("My world \n is the best \n Shaurya Tripathi \n hello world")
file.close()

In [9]:
pattern = "Shaurya"
file = open("test.txt","r")
for line in file:
    if re.search(pattern, line):
        print(line)

 Shaurya Tripathi 



### recursive list 

Write a program which takes a directory and lists all the files in that area with their complete paths. These files could exist in that directory itself, but you should also include any files in the sub-directories of that directory.

In [12]:
import os

In [14]:
cd C:/Users/shaur/OneDrive/Desktop/Courses

C:\Users\shaur\OneDrive\Desktop\Courses


In [18]:
def recursive_list(directory):
    
  files = []
  for file in os.listdir(directory):
    path = os.path.join(directory, file)
    if os.path.isfile(path):
      files.append(path)
    elif os.path.isdir(path):
      files += recursive_list(path)

  return files

In [19]:
files = recursive_list("C:/Users/shaur/OneDrive/Desktop/Courses")
for file in files:
    print(file)

C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Download more courses.url
C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Downloaded from Demonoid - www.dnoid.to.txt
C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Downloaded from TutsGalaxy.com.txt
C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Learn and Master Git & Github from zero to Hero\Download more courses.url
C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Learn and Master Git & Github from zero to Hero\Downloaded from TutsGalaxy.com.txt
C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Learn and Master Git & Github from zero to Hero\TutsGalaxy.com.txt
C:/Users/shaur/OneDrive/Desktop/Courses\Learn and Master Git & Github from zero to Hero\Learn and Master Git & Github from zero to Hero\[Tutsga

### Vampire numbers

A vampire number is any number which has a pair of factors whose digits (in base 10) match the digits in their product. For instance, 21×60=1260, so we would say that 1260 is a vampire number. Vampire numbers have the same number of digits in the factors so 51×3=153 is not a vampire number. Find all the vampire numbers less than 1000000. 

In [20]:
list('1260')

['1', '2', '6', '0']

In [26]:
import itertools as it

def getFangs(num_str):
    num_iter = it.permutations(num_str, len(num_str))

    for num_tuple in num_iter:
        x, y = num_tuple[:int(len(num_tuple)/2)
                         ], num_tuple[int(len(num_tuple)/2):]
 
        x_str, y_str = ''.join(x), ''.join(y)
 
        if x_str[-1] == '0' and y_str[-1] == '0':
            continue
        if int(x_str) * int(y_str) == int(num_str):
            return x_str, y_str
 
    return None

def isVampire(n):
    n_str = str(n)
    if len(n_str)%2!=0:
        return False
    
    fangs = getFangs(n_str)
    if not fangs:
        return False
    return True

In [27]:
for i in range(1,1000000):
    if isVampire(i):
        print(i)

1260
1395
1435
1530
1827
2187
6880
102510
104260
105210
105264
105750
108135
110758
115672
116725
117067
118440
120600
123354
124483
125248
125433
125460
125500
126027
126846
129640
129775
131242
132430
133245
134725
135828
135837
136525
136948
140350
145314
146137
146952
150300
152608
152685
153436
156240
156289
156915
162976
163944
172822
173250
174370
175329
180225
180297
182250
182650
186624
190260
192150
193257
193945
197725
201852
205785
211896
213466
215860
216733
217638
218488
226498
226872
229648
233896
241564
245182
251896
253750
254740
260338
262984
263074
284598
284760
286416
296320
304717
312475
312975
315594
315900
319059
319536
326452
329346
329656
336550
336960
338296
341653
346968
361989
362992
365638
368550
369189
371893
378400
378418
378450
384912
386415
392566
404968
414895
416650
416988
428980
429664
447916
456840
457600
458640
475380
486720
489159
489955
498550
516879
529672
536539
538650
559188
567648
568750
629680
638950
673920
679500
729688
736695
738468
769792