Skip to content
Permalink
Browse files

Adding clarifying comments and README file to Travis Script

  • Loading branch information
Jason Kopp
Jason Kopp committed Jul 11, 2019
1 parent 751ba7a commit 54e49dabed7c4ef0a6cd1f73a273fe87ec1c68e2
Showing with 90 additions and 21 deletions.
  1. +2 −2 4CC_Automation/README.md
  2. +25 −19 scripts/PRsanitycheck.py
  3. +63 −0 scripts/README.md
@@ -1,15 +1,15 @@
# 4CC Automation Script
Created by Jason Kopp for David Singer of Apple Inc.
First uploaded to Github: 5/30/2019
Last updated: 5/30/2019
Last updated: 7/11/2019

## Description
The 4CCAutomationScript.py was created to automate finding unregistered or mistakenly registered specifications on the MP4RA website. The script takes a specification file, the folder of CSV files from the MP4RA (CSV/), finds all the four character codes (4CCs) in each, and compares what is in the specification file to what is registered in the MP4RA.

## unlisted.csv and textualcontent.csv
We created two new MP4RA CSV files: CSV/unlisted.csv and CSV/textualcontent.csv.
- unlisted.csv should store 4CCs that are purposely unlisted/unregistered. Meaning, we know they are missing from the MP4RA and they should stay that way, at least for now.
- textualcontent.csv should store four character long strings that are mistakenly found by this script but are not 4CCs. This script finds any four character long strings that are between single quotes (The regex being used is: `[\'\‘\’][A-Za-z0-9 +-]{4}[\'\‘\’]`) so mistakes happen (i.e. " to ", "also", etc.).
- textualcontent.csv should store four character long strings that are mistakenly found by this script but are not 4CCs. This script finds any four character long strings that are between single quotes (The regex being used is: `[\'\‘\’].{4}[\'\‘\’]`) so mistakes happen (i.e. " to ", "also", etc.).

## How to Run
To run this script on your own specifications type in terminal:
@@ -1,4 +1,4 @@
#!/usr/bin/env python3
4CC#!/usr/bin/env python3
import csv, re, os

# Cycles through the CSV files in the MP4RA repo and returns a tuple contatining 1.)list of all the 4CCs and the associated columns and 2.) list of the specifications and their associtated columns
@@ -47,24 +47,26 @@ def getCSV4CCs(directory):
speclist.append([linkname, spec, desc])
return (codesInCSV, speclist)

#Check to ensure all 4ccs are actually four characters matching the regex below
# 1. Valid, Four Characters Check
# Check to ensure all 4CCs are actually four characters long and valid characters matching the regex below
def notfourcharacters(codes, exceptions=[]):
pattern = re.compile(u'^[\u0020-\u007E]{4}$', re.UNICODE)
mistakeCodes = []
for code in codes:
if pattern.match(code[0]) == None:
if code[0] not in exceptions:
mistakeCodes.append([code[0], code[3]])
print("\nFour Character Codes Test:")
print("\n1. Valid, Four Characters Check:")
if mistakeCodes == []:
print("\tAll 4ccs are four characters - PASS")
print("\tAll 4CCs are valid, four characters - PASS")
return 0
elif mistakeCodes != []:
for i in mistakeCodes:
print("\t'%s' from '%s'" % (i[0], i[1]))
print("\tAll 4ccs are either longer than four characters or not valid - FAIL")
print("\tAll 4CCs are either longer than four characters or not valid - FAIL")
return 1

# 2. Duplicate 4CC Check
#Finds duplitcated codes. Only fails the check if the duplicates are in the same CSV File
def duplicatecodes(codes, exceptions=[]):
#First build a list of just the 4CCs excluding any you want to exclude
@@ -76,7 +78,7 @@ def duplicatecodes(codes, exceptions=[]):
dups.append(codes[i])
dupssorted = sorted(dups)

print("\nDuplicate 4CCs Test:")
print("\n2. Duplicate 4CC Check:")
if dupssorted == []:
print("\tNo duplicates found - PASS")
return 0
@@ -103,20 +105,21 @@ def duplicatecodes(codes, exceptions=[]):
print("\tNo duplicates found in the same CSV - PASS")
return 0

# Create and return the known duplicates file
# Create and return the known duplicates file for exceptions to the duplicatecodes check
def knownduplicates(filename):
with open(filename, 'r') as file:
knownduplicatescsv = [row.replace('$20', ' ').replace('\n', '') for row in file]
return knownduplicatescsv

#Check to make sure all the codes that have specexceptions are registered in the specifications.csv file
# 3. Registered Specification Check
# Check to make sure all the codes that have Specifications are registered in the specifications.csv file
def registerspecs(codesInCSV, speclist, specexceptions=[]):
unregisteredspecs = []
allspecs = [spec[1] for spec in speclist]+specexceptions
for a in range(len(codesInCSV)):
if codesInCSV[a][2] not in allspecs:
unregisteredspecs.append(codesInCSV[a])
print("\nRegistered Specs Test:")
print("\n3. Registered Specification Check:")
if unregisteredspecs == []:
print("\tAll specs are registered - PASS")
return 0
@@ -126,7 +129,8 @@ def registerspecs(codesInCSV, speclist, specexceptions=[]):
print("\tThere are unregistered specs - FAIL")
return 1

#Find CSV Rows that have missing columns
# 4. Missing Columns Check
# Find CSV Rows that have missing columns
def filledcolumns(codesInCSV):
missingcols=[]
for row in codesInCSV:
@@ -141,7 +145,7 @@ def filledcolumns(codesInCSV):
# removes duplicates that arise from rows that have multiple blank cols
newmissingcols = set(notsamplemissing)
# return value
print("\nMissing Columns Test:")
print("\n4. Missing Columns Check:")
if newmissingcols == set():
print("\tNo missing columns - PASS")
return 0
@@ -151,14 +155,15 @@ def filledcolumns(codesInCSV):
print("\tThese specs have missing columns - FAIL")
return 1

# 5. Registered Handlers Check
# Like the specification check, check to ensure all handlers that are used are registered in handlers.csv
def registerhandle(codesInCSV, handleexceptions):
unregisteredhandles = []
allhandles = [handle[1] for handle in codesInCSV if handle[3] == "handlers.csv"]+handleexceptions
for a in range(len(codesInCSV)):
if codesInCSV[a][4] not in allhandles:
unregisteredhandles.append(codesInCSV[a])
print("\nRegistered Handles Test:")
print("\n5. Registered Handlers Check:")
if unregisteredhandles == []:
print("\tAll handles are registered - PASS")
return 0
@@ -180,27 +185,28 @@ def prsanitycheck():

codesspecs = getCSV4CCs(repo)

#TEST for four characters
# 1. Valid, Four Characters Check
codeExceptions = [] #Type in exceptions if you need to
not4ccs = notfourcharacters(codesspecs[0], codeExceptions)
not4CCs = notfourcharacters(codesspecs[0], codeExceptions)

#Test for Duplicates
# 2. Duplicate 4CC Check
knownduplicateslist = knownduplicates(repo+"knownduplicates.csv")
duplicates = duplicatecodes(codesspecs[0], knownduplicateslist)

#Test for Specifications
# 3. Registered Specification Check
specexceptions = ["see (1) below"]
unregisteredspecs = registerspecs(codesspecs[0], codesspecs[1], specexceptions)

#Test for Filled in Columns
# 4. Missing Columns Check
emptycols = filledcolumns(codesspecs[0])

#Test for registered handle types. Must leave "n/a" as a handleexceptions because that is introduced by the script.
# 5. Registered Handlers Check
# Must leave "n/a" in handleexceptions because that is introduced by the script.
handleexceptions = ["n/a", "(various)", "General"]
unregisteredhandles = registerhandle(codesspecs[0], handleexceptions)

# Exit Codes
returnvalue = (not4ccs + duplicates + unregisteredspecs + emptycols + unregisteredhandles)
returnvalue = (not4CCs + duplicates + unregisteredspecs + emptycols + unregisteredhandles)
if returnvalue == 0:
print("\nPR passed all checks")
exit(0)
@@ -0,0 +1,63 @@
# PR Sanity Check Script
Created by Jason Kopp for David Singer of Apple Inc.
First uploaded to Github: 6/21/2019
Last updated: 7/11/2019

## Description
PRsanitycheck.py was created to automate testing pull requests submitted to the MP4RA GitHub repo using [Travis CI](https://travis-ci.org). When a pull request is submitted to the repo, this script is automatically started and the results are reported on the Travis CI website and linked to GitHub in the "Conversation" tab of the PR.

## Five Checks
The script runs five different "checks".

1. Valid, Four Characters Check
- Checks to ensure all 4CCs are four characters long and within the valid unicode character range (0x20 – 0x7E inclusive)
- To do this, I use this regex: `^[\u0020-\u007E]{4}$`
2. Duplicate 4CC Check
- Checks for 4CCs that are used multiple times. All duplicates are reported on Travis CI but the check will only fail if the same 4CCs are used in the same CSV file.
3. Registered Specification Check
- Checks to ensure any 4CCs that have a Specification also register that Specification in the specifications.csv file.
4. Missing Columns Check
- Checks for any missing columns.
- The 5th column of sample-entries.csv is ignored because this is not mandatory
5. Registered Handlers Check
- Checks to ensure any 4CCs that have a Handler also register that Handler in the handlers.csv file.

## Exceptions
You may need to provide exceptions to these checks to force them to pass manually. You can provide exceptions for all but the "Missing Columns Check".

1. Valid, Four Characters Check
- Python list of 4CCs at the bottom of the script: `codeExceptions = []`
2. Duplicate 4CC Check
- List any known duplicates in the knownduplicates.csv file
3. Registered Specification Check
- Python list of specifications at the bottom of the script: `specexceptions = ["see (1) below"]`
4. Missing Columns Check
- No exceptions allowed
5. Registered Handlers Check
- Python list of specifications at the bottom of the script: `handleexceptions = ["n/a", "(various)", "General"]`

## Running Locally
The folder hierarchy changes between Travis and when you clone and run the check on your local machine. Because of that, you need to change where the script searches for the CSV files when you run it locally. At the bottom of the script, comment out `repo = "github"` and comment in `repo = "local"`. And vice-versa when pushing back up to Github.

## Output
On the Travis CI website, you will see the following when the PR passes all checks. You may need to expand the `$ ./scripts/PRsanitycheck.py` line.

```
$ ./scripts/PRsanitycheck.py
1. Valid, Four Characters Check:
All 4CCs are valid, four characters - PASS
2. Duplicate 4CC Check
['ID32', 'id3 version 2 container', 'id3v2', 'boxes.csv', 'n/a', 'n/a', 'n/a']
['ID32', 'id3 version 2 meta-data handler (meta box)', 'id3v2', 'handlers.csv', 'n/a', 'n/a', 'handler']
...
['url ', 'a url', 'jpeg2000', 'boxes.csv', 'n/a', 'n/a', 'n/a']
['url ', 'url data location', 'iso', 'data-references.csv', 'n/a', 'n/a', 'data reference']
No duplicates found in the same CSV - PASS
3. Registered Specification Check
All specs are registered - PASS
4. Missing Columns Check
No missing columns - PASS
5. Registered Handlers Check
All handles are registered - PASS
PR passed all checks
```

0 comments on commit 54e49da

Please sign in to comment.
You can’t perform that action at this time.