Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Bug 858245 - Added a scrubber app to use in the API. #328

Merged
merged 1 commit into from Apr 26, 2013

Conversation

Projects
None yet
2 participants
Member

adngdb commented Apr 25, 2013

@peterbe What do you think? It's quite basic but I think it covers our needs.

Member

adngdb commented Apr 25, 2013

Also, did I do it the right way as a new module in crashstats?

Contributor

peterbe commented Apr 25, 2013

This is brilliant! r+

I think you can change the commit message to point to https://bugzilla.mozilla.org/show_bug.cgi?id=858245 but we don't want to close the bug yet since we also need to use it somewhere.

@peterbe peterbe commented on an outdated diff Apr 25, 2013

crashstats/scrubber/__init__.py
+# source: http://stackp.online.fr/?p=19
+EMAIL = re.compile('([\w\-\.]+@(\w[\w\-]+\.)+[\w\-]+)')
+
+
+# source: http://stackoverflow.com/questions/520031
+URL = re.compile(
+ r"((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.‌​]"
+ "[a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))*))+(?:(([^\s()<>]"
+ "+|(‌​([^\s()<>]+)))*)|[^\s`!()[]{};:'\".,<>?«»“”‘’]))",
+ re.DOTALL
+)
+
+
+def scrub_data(
+ data,
+ remove_fields=None,
@peterbe

peterbe Apr 25, 2013

Contributor

perhaps change these three lines to *args, **kwargs and then...

@peterbe peterbe commented on an outdated diff Apr 25, 2013

crashstats/scrubber/__init__.py
+
+def scrub_data(
+ data,
+ remove_fields=None,
+ replace_fields=None,
+ clean_fields=None
+):
+ """Return a scrubbed copy of a list of dictionaries.
+
+ See `scrub_dict` for parameters.
+ """
+ scrubbed = list(data)
+ for i, item in enumerate(scrubbed):
+ scrubbed[i] = scrub_dict(
+ item,
+ remove_fields=remove_fields,
@peterbe

peterbe Apr 25, 2013

Contributor

...replace these three lines with *args, **kwargs.

That way, if we one day change the parameter signature of scrub_dict we only need to change it in one place.

@peterbe peterbe commented on an outdated diff Apr 25, 2013

crashstats/scrubber/__init__.py
+ * list or tuple of 2-uples
+ * search for patterns in those fields and remove what matches
+ * example: clean_fields=[('comment', EMAIL), ('comment', URL)]
+
+ Any number of those options can be used in the same call. If none is used,
+ return the dictionary unchanged.
+ """
+ if remove_fields is None:
+ remove_fields = []
+ if replace_fields is None:
+ replace_fields = []
+ if clean_fields is None:
+ clean_fields = []
+
+ scrubbed = data.copy()
+ for key in remove_fields:
@peterbe

peterbe Apr 25, 2013

Contributor

One trick I sometimes use to avoid the whole if param is None: param = [] is this:

for key in remove_fields or []:
    ...
Contributor

peterbe commented Apr 25, 2013

Regarding "Also, did I do it the right way as a new module in crashstats?"...
Ideally in django you want this to be part of app. Or some sort of base package. If django-nose is able to pick up the tests with ./manage.py tests I think it's fine.

@adngdb adngdb added a commit that referenced this pull request Apr 26, 2013

@adngdb adngdb Merge pull request #328 from AdrianGaudebert/scrubber
Bug 858245 - Added a scrubber app to use in the API.
9cbc154

@adngdb adngdb merged commit 9cbc154 into mozilla:master Apr 26, 2013

1 check passed

default Jenkins build 'socorro-crashstats-github' #414 has succeeded
Details

@adngdb adngdb deleted the adngdb:scrubber branch Apr 26, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment