Skip to content

Fixes bug 901977 - Store raw crash data into elasticsearch.#1647

Closed
adngdb wants to merge 17 commits into
mozilla-services:masterfrom
adngdb:901977-raw-crash-json-in-elasticsearch
Closed

Fixes bug 901977 - Store raw crash data into elasticsearch.#1647
adngdb wants to merge 17 commits into
mozilla-services:masterfrom
adngdb:901977-raw-crash-json-in-elasticsearch

Conversation

@adngdb
Copy link
Copy Markdown
Contributor

@adngdb adngdb commented Nov 4, 2013

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of making the raw crash a branch of the processed crash, could you consider making a two branch tree instead?

raw_and_processed = {
    'raw_crash': raw_crash,
    'processed_crash': processed_crash
}

or would that be too disruptive to all the data that's already in ES?

One of my current initiatives is to unify the fragmentation of the processed_crash format. The current state is that PG, ES, and HB/FS all store the processed crash in a little bit different form. PG/HB/FS are all lossy - the new redaction methods and the saving the json form of the processed crash in PG are all about making them all store exactly the same data.

If you add the 'raw_crash' key to the processed crash, you're making the ES processed crash different from the others. When we eventually document the processed_crash schema, we'll have to make an exception for ES and point out the difference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that would indeed imply changes to both advanced search and supersearch as well as a full reindexing of our database. We might need to do the reindexing at some point, especially since we will want to have that raw_crash field everywhere. Maybe it is worth putting the effort now.

I'm a bit concerned that this change might break search for a little though. I'm not quite sure what the strategy for data would be here. I expect that we will need to reprocess the last 6 months of crashes (but putting them in elasticsearch only, no need to reindex in postgres and hbase). Reprocessing will be needed because we don't have unredacted processed crashes in HBase yet, and we want PII data to be in elasticsearch.

I would be happy to discuss with you a strategy for reprocessing for elasticsearch only.

@adngdb
Copy link
Copy Markdown
Contributor Author

adngdb commented Nov 14, 2013

Closing for the moment, will reopen when it is ready for review.

@adngdb adngdb closed this Nov 14, 2013
adngdb and others added 16 commits November 14, 2013 15:38
…remove-deprecated-middleware

Fixes bug 891921 - Removed all files related to the old, obsolete middleware.
…sig-hist-doc

Fixes bug 938410 - Fixed example in signature_history documentation.
…block

Bug 939141 - Annotate the largest free VM block in the processed crash. r=ted
Fixes Bug 931147 - tagged logging of transaction failures with name of the resource experiencing failure
…6-non-plotted-graphs-on-topcrasher

Bug789526 non plotted graphs on topcrasher
@adngdb adngdb reopened this Nov 19, 2013
@adngdb adngdb closed this Nov 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants