### Assignment 3



#### **Task 1 Code refactoring**
###### This task is a pause-reflect-and-improve code refactoring practice. You are required to restructure your code from the first two assignments and to improve its internal structure and non-functional features. Specifically, you are encouraged to:

* discover, design, and create potentially reusable functions for your code.
* rewrite some of the code to make it more concise and more efficient using techniques like comprehensions.
  * list comprehensions
  * dictionary comprehensions
  * set comprehensions
  * generator expressions
* Add necessary comments and documentation to explain the logic and key steps.

Keep in mind that do not overdo it. For example, comprehensions are just a different form of for loops. Code looks very concise with comprehensions, but could be hard to understand. Comments are also very helpful, but the main body of a program file should be code, not explanatory text. Or the code should be self-explanatory. Overall, we need to strike a balance between being (syntax) efficient and being comprehensible.

For those who are still behind the schedule, this is an opportunity to catch up. For those who have finished all the assignments, this is where you can try alternative approaches and gain a higher-level perspective on programming and code structure.

In [None]:
import math
[math.sqrt(k) for k in range(10**2) if k%2 ==0]==[math.sqrt(k) for k in range(0, 10**2,2)]

True

In [None]:
data = JsonfileToData()
countries = set([f['properties']['NAME'] for f in data])
print("The list of unique county names:", countries)
print("The number of unique county names:", len(countries))

def mostCommonName(k):
  featureList = JsonfileToData()
  cntyList = [f['properties']['NAME'] for f in featureList]
  cntyDict = {x:cntyList.count(x) for x in cntyList}
  return sorted(cntyDict.items(), key=lambda item:item[1], reverse=True)[0:k]


# main procedure: get the three most common names of countries
cnty = mostCommonName(3)
print('The most common names of the counties are {}'.format([x[0] for x in cnty]))


# Derive the numbers of counties that use these three names, respectively. For each of them, list their county name and state code.
def commonCountries(data, k):
    cnty=mostCommonName(k)
    items = [item for item in [[f['properties']['STATE'], f['properties']['NAME']] for f in data if f['properties']['NAME'] in [x[0] for x in cnty]]]
    return cnty, items

# main procedure: get the three most common names of countries with state code
cnty, items = commonCountries(data,3)
print('The numbers of counties that use these most common names are {}'.format(cnty))
print(f'the top 3 most common counties along with state code.', '\n', '%s  %s' % ("state","countries"))
print('\n'.join('{}     {}'.format(item[0], item[1]) for item in items))

#### **Task 2 OMNY's 7-day cap**
According to MTA, OMNY has the fare capping that is equivalent to the 7-day pass.

>Simply tap and go with the same contactless credit or debit card, smart device, or OMNY card. Once you’ve hit 12 paid trips in a seven-day period, the rest of your rides for that week will automatically be free.

We will use simulated data to develop and test an algorithm that determines whether to charge a user when s/he taps an OMNY terminal.

In [None]:
# simulate the ID card
import random
import string

def generatePayee(k):
    n= k+200
    s = string.ascii_lowercase + string.digits
    payee_ids = [None]*n

# set the random seed, so results are replicable
    random.seed( 1090 )

    for i in range(n):
          payee_ids[i] = ''.join(random.choice(s) for k in range(10))

# Using list comprehensions would be much more concise.
# But this is little bit hard to understand as it has two layers of list comprehensions
# payee_ids = [''.join(random.choice(s) for i in range(10)) for _ in range(n)]

# Exclude any possible duplicates (extremely unlikely, but possible)
    payee_ids = list(set(payee_ids))[0:k]
    return payee_ids


# main procedure
print(generatePayee(10))

['290hc8h8qq', 'tth387umwk', 'hqjeeiw3xa', '9w2vjl0jxm', 'qga8s1n1me', '8zoywa5q59', 'kuiwndffyc', 'gwgft51fiw', 'h8z3gq517q', '39q5e0viad']


In [None]:
from datetime import datetime, timedelta

def generatePayment(tDate, dd): # tDate: current date, dd: number of days
    payee_ids = generatePayee(1000)
    nn = len(payee_ids)*dd*2 # on average, each payee makes two trips.

# the time during the day does not really matter for OMNY rules. We just set it to a random number between 5 to 23.
    payTime = [tDate - timedelta(days = random.uniform(0, dd), \
                             hours = random.choice(range(5,24)), \
                             minutes = random.choice(range(0, 60)), \
                             seconds=random.choice(range(0, 60))) for _ in range(nn)]

    payees = random.choices(payee_ids, weights=[abs(random.normalvariate(0,1)) for _ in range(len(payee_ids))], k=nn)

    simData = [list(x) for x in zip(payees, payTime)]
# Sort according to the time, from early to late.
    simSData = sorted(simData, key= lambda z: z[1],  reverse = True)
    return simSData

# Main procedure:  Print the top 10 rows
tDate = datetime(2023, 11, 25)
simSData = generatePayment(tDate, 7)
for k in range(10):
    print()
    print("Index: {}, ID: {}, Time: {}".format(k, simSData[k][0], "" + simSData[k][1].strftime("%Y/%m/%d %I:%M%p, %A")))


Index: 0, ID: lz8yj49fol, Time: 2023/11/24 06:23PM, Friday

Index: 1, ID: f4sypdxy9k, Time: 2023/11/24 06:07PM, Friday

Index: 2, ID: 4e8pxmzkqh, Time: 2023/11/24 05:37PM, Friday

Index: 3, ID: sxzqhzk3af, Time: 2023/11/24 05:27PM, Friday

Index: 4, ID: bnwc0j07xg, Time: 2023/11/24 05:26PM, Friday

Index: 5, ID: yx6un9l950, Time: 2023/11/24 05:13PM, Friday

Index: 6, ID: 7pyjzwfjam, Time: 2023/11/24 04:26PM, Friday

Index: 7, ID: egkylqw3jp, Time: 2023/11/24 04:16PM, Friday

Index: 8, ID: 9me4w0n6r1, Time: 2023/11/24 04:14PM, Friday

Index: 9, ID: cmkboqyfvy, Time: 2023/11/24 04:12PM, Friday


So, we have the simulated data of a list of list, which contains the IDs of the payee devices and the corresponding times when they are used to pay MTA fares. Now, you can write code to determine if it should be charged a fare or not using the OMNY 7-day capping rule. The code should produces a float (fare) or a boolean (free or not) for each time or each "row" in the list.

In [None]:
# algorithm 1 : track the payment history for the target customer
def capping(current, test, fare):
  simSData = generatePayment(current, 8)
  trips = 0
  print("Tracking payment history:")
  for payment in simSData:
      if (test[1] - payment[1]).total_seconds() >= 7*24*3600:  # miss OMNY 7-day capping rule
          return '$'+str(fare)
      if (payment[0]==test[0]):
          trips +=1
          print("Index: {}, ID: {}, Time: {}".format(trips, payment[0], "" + payment[1].strftime("%Y/%m/%d %I:%M%p, %A")))
      if trips == 12:
          return "Free" # meet the OMNY 7-day capping rule
  return False # not enough data, return default value as not meeting OMNY 7-day capping rule

# main procedure
current = datetime(2024, 2, 25)
fare = 2.29
test = generatePayment(current, 1)[0]   # generate the target payment to be judged using the OMNY 7-day capping rule
print("One customer comes in :", "ID: {}, Time: {}".format(test[0], "" + test[1].strftime("%Y/%m/%d %I:%M%p, %A")))
print("Final fare based on the OMNY 7-day capping rule is:", capping(current, test, fare))

One customer comes in : ID: 9amf7up454, Time: 2024/02/24 06:02PM, Saturday
Tracking payment history:
Index: 1, ID: 9amf7up454, Time: 2024/02/20 04:34AM, Tuesday
Index: 2, ID: 9amf7up454, Time: 2024/02/18 01:23AM, Sunday
Index: 3, ID: 9amf7up454, Time: 2024/02/18 01:15AM, Sunday
Final fare based on the OMNY 7-day capping rule is: $2.29


In [None]:
# algorithm 2 :extract the info of the target customer from the payment history
def capping2(current, test, fare):
  simSData = generatePayment(current, 8)
  target = [payment for payment in simSData if ((payment[0]==test[0]) and ((test[1] - payment[1]).total_seconds() < 7*24*3600))]
  if len(target) >= 12:
    return "Free"  # meet the OMNY 7-day capping rule
  print("Tracking payment history:")
  print('\n'.join("ID: {}, Time: {}".format(payment[0], payment[1].strftime("%Y/%m/%d %I:%M%p, %A")) for payment in target))
  return '$'+str(fare) # not enough data, return default value as not meeting OMNY 7-day capping rule

# main procedure
current = datetime(2024, 2, 25)
fare = 2.29
test = generatePayment(current, 1)[0]   # generate the target payment to be judged using the OMNY 7-day capping rule
print("One customer comes in :", "ID: {}, Time: {}".format(test[0], "" + test[1].strftime("%Y/%m/%d %I:%M%p, %A")))
print("Final fare based on the OMNY 7-day capping rule is:", capping2(current, test, fare))

One customer comes in : ID: 9amf7up454, Time: 2024/02/24 06:02PM, Saturday
Tracking payment history:
ID: 9amf7up454, Time: 2024/02/20 04:34AM, Tuesday
ID: 9amf7up454, Time: 2024/02/18 01:23AM, Sunday
ID: 9amf7up454, Time: 2024/02/18 01:15AM, Sunday
Final fare based on the OMNY 7-day capping rule is: $2.29


In [None]:
def simulateCapping(current, k):
    fare = 2.29
    test = generatePayment(current, 1)
    for i in range(k):
        print("Final fare for customer ID: {}, Time: {}, Fare: {}".format(test[i][0], "" + test[i][1].strftime("%Y/%m/%d %I:%M%p, %A"),capping(current, test[i], fare)))

# main procedure
current = datetime(2024, 2, 25)
simulateCapping(current, 10)


Tracking payment history:
Index: 1, ID: 9amf7up454, Time: 2024/02/20 04:34AM, Tuesday
Index: 2, ID: 9amf7up454, Time: 2024/02/18 01:23AM, Sunday
Index: 3, ID: 9amf7up454, Time: 2024/02/18 01:15AM, Sunday
Final fare for customer ID: 9amf7up454, Time: 2024/02/24 06:02PM, Saturday, Fare: $2.29
Tracking payment history:
Index: 1, ID: hn3wipkv2u, Time: 2024/02/23 09:47AM, Friday
Index: 2, ID: hn3wipkv2u, Time: 2024/02/20 09:42PM, Tuesday
Index: 3, ID: hn3wipkv2u, Time: 2024/02/18 05:27PM, Sunday
Index: 4, ID: hn3wipkv2u, Time: 2024/02/17 10:03PM, Saturday
Final fare for customer ID: hn3wipkv2u, Time: 2024/02/24 05:59PM, Saturday, Fare: $2.29
Tracking payment history:
Index: 1, ID: 0hngwmrc4o, Time: 2024/02/24 03:14AM, Saturday
Index: 2, ID: 0hngwmrc4o, Time: 2024/02/23 02:33AM, Friday
Index: 3, ID: 0hngwmrc4o, Time: 2024/02/22 03:55PM, Thursday
Index: 4, ID: 0hngwmrc4o, Time: 2024/02/22 12:26AM, Thursday
Index: 5, ID: 0hngwmrc4o, Time: 2024/02/21 09:41AM, Wednesday
Index: 6, ID: 0hngwmrc4o,