Skip to content
Permalink
Browse files

Merge pull request #78 from jasonkopp/readme_updates

4CC and Travis Scripts README and comment updates
  • Loading branch information
dwsinger committed Oct 11, 2019
2 parents 3b182ac + 98f10ad commit 9beb558b5568c493036030d6df35d60003ee9e86
Showing with 100 additions and 27 deletions.
  1. +2 −2 4CC_Automation/README.md
  2. +31 −25 scripts/PRsanitycheck.py
  3. +67 −0 scripts/README.md
@@ -1,15 +1,15 @@
# 4CC Automation Script
Created by Jason Kopp for David Singer of Apple Inc.
First uploaded to Github: 5/30/2019
Last updated: 5/30/2019
Last updated: 7/11/2019

## Description
The 4CCAutomationScript.py was created to automate finding unregistered or mistakenly registered specifications on the MP4RA website. The script takes a specification file, the folder of CSV files from the MP4RA (CSV/), finds all the four character codes (4CCs) in each, and compares what is in the specification file to what is registered in the MP4RA.

## unlisted.csv and textualcontent.csv
We created two new MP4RA CSV files: CSV/unlisted.csv and CSV/textualcontent.csv.
- unlisted.csv should store 4CCs that are purposely unlisted/unregistered. Meaning, we know they are missing from the MP4RA and they should stay that way, at least for now.
- textualcontent.csv should store four character long strings that are mistakenly found by this script but are not 4CCs. This script finds any four character long strings that are between single quotes (The regex being used is: `[\'\‘\’][A-Za-z0-9 +-]{4}[\'\‘\’]`) so mistakes happen (i.e. " to ", "also", etc.).
- textualcontent.csv should store four character long strings that are mistakenly found by this script but are not 4CCs. This script finds any four character long strings that are between single quotes (The regex being used is: `[\'\‘\’].{4}[\'\‘\’]`) so mistakes happen (i.e. " to ", "also", etc.).

## How to Run
To run this script on your own specifications type in terminal:
@@ -13,9 +13,9 @@ def getCSV4CCs(directory):
headers = csvReader.fieldnames
if 'code' in headers:
for row in csvReader:
#Replaces spaces with underscores and then $20 with spaces.
#I needed to build the CSV files into python using "n/a" because otherwise the travis check would rearranged the columns. So including all the information I needed in every line of the python csv object was my only solution for solving that issue.
csvCode = row['code'].replace(' ', '_').replace('$20', ' ')
# Replaces spaces with and then $20 with spaces. Spaces are a valid unicode character that can be used but on the MP4RA site, they are displayed as "$20". So there should not be any space characters in the repo. The script converts spaces into invalid unicode characters, so they fail the test. I then convert $20 into single space characters to simplify the regex being used.
csvCode = row['code'].replace(' ', '✀').replace('$20', ' ')
#I needed to build the CSV files into python using "n/a" because otherwise the travis check would rearranged the columns. So including all the information I needed in every line of the python csv list was my only solution for solving that issue.
if 'description' in headers:
csvDesc = row['description']
else:
@@ -47,24 +47,26 @@ def getCSV4CCs(directory):
speclist.append([linkname, spec, desc])
return (codesInCSV, speclist)

#Check to ensure all 4ccs are actually four characters matching the regex below
# 1. Valid, Four Characters Check
# Check to ensure all 4CCs are actually four characters long and valid characters matching the regex below
def notfourcharacters(codes, exceptions=[]):
pattern = re.compile(u'^[\u0020-\u007E]{4}$', re.UNICODE)
mistakeCodes = []
for code in codes:
if pattern.match(code[0]) == None:
if code[0] not in exceptions:
mistakeCodes.append([code[0], code[3]])
print("\nFour Character Codes Test:")
print("\n1. Valid, Four Characters Check:")
if mistakeCodes == []:
print("\tAll 4ccs are four characters - PASS")
print("\tAll 4CCs are valid, four characters - PASS")
return 0
elif mistakeCodes != []:
for i in mistakeCodes:
print("\t'%s' from '%s'" % (i[0], i[1]))
print("\tAll 4ccs are either longer than four characters or not valid - FAIL")
print("\tAll 4CCs are either longer than four characters or not valid - FAIL")
return 1

# 2. Duplicate 4CC Check
#Finds duplitcated codes. Only fails the check if the duplicates are in the same CSV File
def duplicatecodes(codes, exceptions=[]):
#First build a list of just the 4CCs excluding any you want to exclude
@@ -76,7 +78,7 @@ def duplicatecodes(codes, exceptions=[]):
dups.append(codes[i])
dupssorted = sorted(dups)

print("\nDuplicate 4CCs Test:")
print("\n2. Duplicate 4CC Check:")
if dupssorted == []:
print("\tNo duplicates found - PASS")
return 0
@@ -103,20 +105,21 @@ def duplicatecodes(codes, exceptions=[]):
print("\tNo duplicates found in the same CSV - PASS")
return 0

# Create and return the known duplicates file
# Create and return the known duplicates file for exceptions to the duplicatecodes check
def knownduplicates(filename):
with open(filename, 'r') as file:
knownduplicatescsv = [row.replace('$20', ' ').replace('\n', '') for row in file]
return knownduplicatescsv

#Check to make sure all the codes that have specexceptions are registered in the specifications.csv file
# 3. Registered Specification Check
# Check to make sure all the codes that have Specifications are registered in the specifications.csv file
def registerspecs(codesInCSV, speclist, specexceptions=[]):
unregisteredspecs = []
allspecs = [spec[1] for spec in speclist]+specexceptions
for a in range(len(codesInCSV)):
if codesInCSV[a][2] not in allspecs:
unregisteredspecs.append(codesInCSV[a])
print("\nRegistered Specs Test:")
print("\n3. Registered Specification Check:")
if unregisteredspecs == []:
print("\tAll specs are registered - PASS")
return 0
@@ -126,7 +129,8 @@ def registerspecs(codesInCSV, speclist, specexceptions=[]):
print("\tThere are unregistered specs - FAIL")
return 1

#Find CSV Rows that have missing columns
# 4. Missing Columns Check
# Find CSV Rows that have missing columns
def filledcolumns(codesInCSV):
missingcols=[]
for row in codesInCSV:
@@ -141,7 +145,7 @@ def filledcolumns(codesInCSV):
# removes duplicates that arise from rows that have multiple blank cols
newmissingcols = set(notsamplemissing)
# return value
print("\nMissing Columns Test:")
print("\n4. Missing Columns Check:")
if newmissingcols == set():
print("\tNo missing columns - PASS")
return 0
@@ -151,14 +155,15 @@ def filledcolumns(codesInCSV):
print("\tThese specs have missing columns - FAIL")
return 1

# 5. Registered Handlers Check
# Like the specification check, check to ensure all handlers that are used are registered in handlers.csv
def registerhandle(codesInCSV, handleexceptions):
def registerhandle(codesInCSV, handlerexceptions):
unregisteredhandles = []
allhandles = [handle[1] for handle in codesInCSV if handle[3] == "handlers.csv"]+handleexceptions
allhandles = [handle[1] for handle in codesInCSV if handle[3] == "handlers.csv"]+handlerexceptions
for a in range(len(codesInCSV)):
if codesInCSV[a][4] not in allhandles:
unregisteredhandles.append(codesInCSV[a])
print("\nRegistered Handles Test:")
print("\n5. Registered Handlers Check:")
if unregisteredhandles == []:
print("\tAll handles are registered - PASS")
return 0
@@ -180,27 +185,28 @@ def prsanitycheck():

codesspecs = getCSV4CCs(repo)

#TEST for four characters
# 1. Valid, Four Characters Check
codeExceptions = [] #Type in exceptions if you need to
not4ccs = notfourcharacters(codesspecs[0], codeExceptions)
not4CCs = notfourcharacters(codesspecs[0], codeExceptions)

#Test for Duplicates
# 2. Duplicate 4CC Check
knownduplicateslist = knownduplicates(repo+"knownduplicates.csv")
duplicates = duplicatecodes(codesspecs[0], knownduplicateslist)

#Test for Specifications
# 3. Registered Specification Check
specexceptions = ["see (1) below"]
unregisteredspecs = registerspecs(codesspecs[0], codesspecs[1], specexceptions)

#Test for Filled in Columns
# 4. Missing Columns Check
emptycols = filledcolumns(codesspecs[0])

#Test for registered handle types. Must leave "n/a" as a handleexceptions because that is introduced by the script.
handleexceptions = ["n/a", "(various)", "General"]
unregisteredhandles = registerhandle(codesspecs[0], handleexceptions)
# 5. Registered Handlers Check
# Must leave "n/a" in handlerexceptions because that is introduced by the script.
handlerexceptions = ["n/a", "(various)", "General"]
unregisteredhandles = registerhandle(codesspecs[0], handlerexceptions)

# Exit Codes
returnvalue = (not4ccs + duplicates + unregisteredspecs + emptycols + unregisteredhandles)
returnvalue = (not4CCs + duplicates + unregisteredspecs + emptycols + unregisteredhandles)
if returnvalue == 0:
print("\nPR passed all checks")
exit(0)
@@ -0,0 +1,67 @@
# PR Sanity Check Script
Created by Jason Kopp for David Singer of Apple Inc.
First uploaded to Github: 6/21/2019
Last updated: 7/11/2019

## Description
PRsanitycheck.py was created to automate testing pull requests submitted to the MP4RA GitHub repo using [Travis CI](https://travis-ci.org). When a pull request is submitted to the repo, this script is automatically started and the results are reported on the Travis CI website and linked to GitHub in the "Conversation" tab of the PR.

## Five Checks
The script runs five different "checks".

1. Valid, Four Characters Check
- Checks to ensure all 4CCs are four characters long and within the valid unicode character range (0x20 – 0x7E inclusive)
- To do this, I use this regex: `^[\u0020-\u007E]{4}$`
2. Duplicate 4CC Check
- Checks for 4CCs that are used multiple times. All duplicates are reported on Travis CI but the check will only fail if duplicates are found in the same CSV file
3. Registered Specification Check
- Checks to ensure any 4CCs that have a Specification also register that Specification in the specifications.csv file
4. Missing Columns Check
- Checks for any missing columns
- The 5th column of sample-entries.csv is ignored because it is not mandatory
5. Registered Handlers Check
- Checks to ensure any 4CCs that have a Handler also register that Handler in the handlers.csv file

## Exceptions
You may need to provide exceptions to these checks to force them to pass manually. You can provide exceptions for all but the "Missing Columns Check".

1. Valid, Four Characters Check
- Python list of 4CCs at the bottom of the script: `codeExceptions = []`
2. Duplicate 4CC Check
- List any known duplicates in the knownduplicates.csv file
3. Registered Specification Check
- Python list of specifications at the bottom of the script: `specexceptions = ["see (1) below"]`
4. Missing Columns Check
- No exceptions allowed
5. Registered Handlers Check
- Python list of handlers at the bottom of the script: `handlerexceptions = ["n/a", "(various)", "General"]`
- "n/a" must remain in handlerexceptions because it is introduced by this script. See "Note" below.

## Running Locally
The folder hierarchy changes between Travis and when you clone the MP4RA repo to your local machine and run the check. Because of that, you need to change where the script searches for the CSV files when you run it locally. At the bottom of the script, comment out `repo = "github"` and comment in `repo = "local"`. And vice-versa when pushing back up to Github.

## Note
Travis CI seems to read CSV files differently than when running the script locally. So I couldn't rely on the column index or other normal means when building variables. I needed to convert all the CSV rows/columns into a single Python nested list: "codesInCSV". So every 4CC in my script needed a description, specification, handler, ObjectType, and Type despite not every 4CC actually having all of those attributes. That is why I introduced "n/a" into many of the indices of those nested lists. For instance, I needed every 4CC to have a handler index so I could refer to them later. Those without a Handler, were given the Handler "n/a".

## Output
On the Travis CI website, you will see the following when the PR passes all checks. You may need to expand the `$ ./scripts/PRsanitycheck.py` line.

```
$ ./scripts/PRsanitycheck.py
1. Valid, Four Characters Check:
All 4CCs are valid, four characters - PASS
2. Duplicate 4CC Check
['ID32', 'id3 version 2 container', 'id3v2', 'boxes.csv', 'n/a', 'n/a', 'n/a']
['ID32', 'id3 version 2 meta-data handler (meta box)', 'id3v2', 'handlers.csv', 'n/a', 'n/a', 'handler']
...
['url ', 'a url', 'jpeg2000', 'boxes.csv', 'n/a', 'n/a', 'n/a']
['url ', 'url data location', 'iso', 'data-references.csv', 'n/a', 'n/a', 'data reference']
No duplicates found in the same CSV - PASS
3. Registered Specification Check
All specs are registered - PASS
4. Missing Columns Check
No missing columns - PASS
5. Registered Handlers Check
All handles are registered - PASS
PR passed all checks
```

0 comments on commit 9beb558

Please sign in to comment.
You can’t perform that action at this time.