-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backfill the Spec11ThreatMatches table #714
Conversation
127e468
to
65d0d75
Compare
b272a4c
to
2f686d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 49 at r1 (raw file):
import org.json.JSONObject; public class BackfillSpec11ThreatMatchCommand {
If you make it implement ConfirmingCommand you'll get the prompt
and execute
methods
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 51 at r1 (raw file):
public class BackfillSpec11ThreatMatchCommand { private final GcsUtils gcsUtils;
nit: move this down so it's close to the constructor -- keeps things together and keeps constants separate from the rest of the code
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 53 at r1 (raw file):
private final GcsUtils gcsUtils; private static final FluentLogger logger = FluentLogger.forEnclosingClass(); private static final YearMonth START_DATE = new YearMonth(2019, 01);
nit: technically these are months, not dates
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 54 at r1 (raw file):
private static final FluentLogger logger = FluentLogger.forEnclosingClass(); private static final YearMonth START_DATE = new YearMonth(2019, 01); private static final YearMonth END_DATE = new YearMonth(2020, 07);
We can probably just use the current month
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 69 at r1 (raw file):
ImmutableList<Spec11ThreatMatch> threatMatches = getSpec11ThreatMatchesFromFile(spec11ReportFilename); for (Spec11ThreatMatch threatMatch : threatMatches) {
Probably worth looking in to the jpaTm() method that saves multiple entities (it could save us from having to open up a new transaction every time)
Potentially also look in to batching, so maybe saving like 50 entities at a time.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 94 at r1 (raw file):
.setCheckDate(LocalDate.parse(date, ISODateTimeFormat.date())) .setRegistrarId(registrarId) .setDomainRepoId(getDomainRepoId(domainName, registrarId, new DateTime()))
Generally it's not a good idea to create the DateTime from scratch, since it could vary on how we run this, or just various timing shenanigans. Usually you use one of two options:
- Getting the transaction time from a database transaction that you're inside (this doesn't really apply here)
- Using a particular reference point. Here, this entry is in the past, right? So we should use the date as it was then.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 119 at r1 (raw file):
// This method has not been tested yet, and there may be an off-by-one error private List<YearMonth> getMonthsList(YearMonth start, YearMonth end) {
Can we do something like
ImmutableList.Builder<String> builder = new ImmutableList.Builder();
YearMonth now = ???
YearMonth month = START_MONTH;
while (!month.isAfter(now)) {
builder.add(String.format(REPORTING_FOLDER_FORMAT, month.toString())
month = month.plusMonths(1);
}
return builder.build()
as a method to get all the folders that we want to list? Might be nice to avoid having to specify an end date
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 132 at r1 (raw file):
*/ private String getDateFromFilename(String filename) { return filename.substring(22);
Better to use a regex (with a capturing group)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 154 at r1 (raw file):
} private ImmutableList<GcsFilename> getSpec11ReportFilenames(ImmutableList<LocalDate> dateList)
It seems like we're converting to/from dates and strings a few times, probably more than we need to. We should only need to do things like
- get a list of months in question
- convert those months to file paths to list
- list those paths, get all the full file paths of the spec11 files
- for each path, get all the objects from that path, and use the date from the file name
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 164 at r1 (raw file):
} private String getDomainRepoId(String domainName, String registrarId, DateTime now) {
I don't think this should be here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @gbrodman and @leginachen)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 109 at r1 (raw file):
// Iterate from 1 to size() to skip the header at line 0. for (int i = 1; i < reportLines.size(); i++) { String date = getDateFromFilename(spec11ReportFilename.getBucketName());
I think in your current implementation you already had this date as a LocalDate at one point when you were constructing the list of GcsFilename. Rather than having to construct the date twice, you could use a map that maps a date to a filename, and then you could call this method with both the date and the filename. This may not apply if you end up reconfiguring how you get the GcsFilenames
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 111 at r1 (raw file):
String date = getDateFromFilename(spec11ReportFilename.getBucketName()); for (Spec11ThreatMatch threatMatch : createSpec11ThreatMatches(reportLines.get(i), date)) { builder.add(threatMatch);
you shouldnt have to do this in a for loop, you can just addAll the contents of the response from createSpec11ThreatMatches to the builder
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 136 at r1 (raw file):
private ImmutableList<LocalDate> getDateList() throws IOException { List<YearMonth> bucketMonths = getMonthsList(START_DATE, END_DATE);
Rather than a separate method to get a list of months, why not just use a while loop. Something like...
YearMonth yearMonth = START_DATE;
while(yearMonth <= END_DATE){
....
yearMonth.plus(1);
}
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 154 at r1 (raw file):
Previously, gbrodman wrote…
It seems like we're converting to/from dates and strings a few times, probably more than we need to. We should only need to do things like
- get a list of months in question
- convert those months to file paths to list
- list those paths, get all the full file paths of the spec11 files
- for each path, get all the objects from that path, and use the date from the file name
I agree.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 158 at r1 (raw file):
ImmutableList.Builder<GcsFilename> spec11ReportFilenames = new ImmutableList.Builder<>(); for (LocalDate date : dateList) { GcsFilename gcsFilename = getGcsFilename(REPORTING_BUCKET, date);
the contents of getGcsFilename() can just be inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @gbrodman and @leginachen)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 49 at r1 (raw file):
Previously, gbrodman wrote…
If you make it implement ConfirmingCommand you'll get the
prompt
andexecute
methods
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 54 at r1 (raw file):
Previously, gbrodman wrote…
We can probably just use the current month
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 12 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 69 at r1 (raw file):
Previously, gbrodman wrote…
Probably worth looking in to the jpaTm() method that saves multiple entities (it could save us from having to open up a new transaction every time)
Potentially also look in to batching, so maybe saving like 50 entities at a time.
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 94 at r1 (raw file):
Previously, gbrodman wrote…
Generally it's not a good idea to create the DateTime from scratch, since it could vary on how we run this, or just various timing shenanigans. Usually you use one of two options:
- Getting the transaction time from a database transaction that you're inside (this doesn't really apply here)
- Using a particular reference point. Here, this entry is in the past, right? So we should use the date as it was then.
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 109 at r1 (raw file):
Previously, sarahcaseybot wrote…
I think in your current implementation you already had this date as a LocalDate at one point when you were constructing the list of GcsFilename. Rather than having to construct the date twice, you could use a map that maps a date to a filename, and then you could call this method with both the date and the filename. This may not apply if you end up reconfiguring how you get the GcsFilenames
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 111 at r1 (raw file):
Previously, sarahcaseybot wrote…
you shouldnt have to do this in a for loop, you can just addAll the contents of the response from createSpec11ThreatMatches to the builder
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 119 at r1 (raw file):
Previously, gbrodman wrote…
Can we do something like
ImmutableList.Builder<String> builder = new ImmutableList.Builder(); YearMonth now = ??? YearMonth month = START_MONTH; while (!month.isAfter(now)) { builder.add(String.format(REPORTING_FOLDER_FORMAT, month.toString()) month = month.plusMonths(1); } return builder.build()
as a method to get all the folders that we want to list? Might be nice to avoid having to specify an end date
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 132 at r1 (raw file):
Previously, gbrodman wrote…
Better to use a regex (with a capturing group)
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 136 at r1 (raw file):
Previously, sarahcaseybot wrote…
Rather than a separate method to get a list of months, why not just use a while loop. Something like...
YearMonth yearMonth = START_DATE;
while(yearMonth <= END_DATE){
....
yearMonth.plus(1);
}
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 154 at r1 (raw file):
Previously, sarahcaseybot wrote…
I agree.
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 158 at r1 (raw file):
Previously, sarahcaseybot wrote…
the contents of getGcsFilename() can just be inlined
Done.
This pull request introduces 1 alert when merging bee5919 into d7fb609 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @gbrodman and @leginachen)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 54 at r1 (raw file):
Previously, leginachen (Legina Chen) wrote…
Done.
Does setting this to just new YearMonth() set it to the current month? If so, can you please add a comment that indicates that?
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 55 at r2 (raw file):
Pattern.compile("SPEC11_MONTHLY_REPORT_(\\d{4}-\\d{2}-\\d{2})"); private static final String REPORTING_FOLDER = "domain-registry-reporting/icann/spec11/"; private HashMap<GcsFilename, LocalDate> filenamesToDates = new HashMap<>();
Typically we use ImmutableMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 55 at r2 (raw file):
Previously, sarahcaseybot wrote…
Typically we use ImmutableMap
Done.
bee5919
to
e657de8
Compare
This pull request introduces 1 alert when merging b3fc3dd into d7fb609 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 2 files reviewed, 9 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 48 at r4 (raw file):
public class BackfillSpec11ThreatMatchCommand extends ConfirmingCommand {
We'll also want to override the prompt
method -- this should print out a description of the actions that will be done in the execute() method in some way (perhaps just printing out the number of files we'd parse or something quick like that)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 60 at r4 (raw file):
@Override protected String execute() throws IOException { ImmutableMap<GcsFilename, LocalDate> filenamesToDates =
possible nit: does this fit all on one line?
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 62 at r4 (raw file):
ImmutableMap<GcsFilename, LocalDate> filenamesToDates = mapFilenamesToDates(START_MONTH, END_MONTH); for (GcsFilename spec11ReportFilename : filenamesToDates.keySet()) {
Instead of iterating over the key set, a common way of iterating through maps is the following:
filenamesToDates.entrySet().stream().map(entry -> getSpec11ThreatMatchesFromFile(entry.getKey(), entry.getValue()))
.forEach(threatMatches -> jpaTm().transact(() -> jpaTm().saveNewOrUpdateAll(threatMatches)));
basically streaming the entry set, then mapping each <key, value> entry into the corresponding list, then calling persist on each of the lists
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 68 at r4 (raw file):
jpaTm().transact(() -> jpaTm().saveNewOrUpdateAll(threatMatches)); } // TODO: return a result description
Generally you want to return a string that describes a summary of what was done -- successes, failures, etc
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 77 at r4 (raw file):
JSONArray threatMatchesArray = reportJSON.getJSONArray(Spec11Pipeline.THREAT_MATCHES_FIELD); ImmutableList.Builder<Spec11ThreatMatch> threatMatches = ImmutableList.builder(); for (int i = 0; i < threatMatchesArray.length(); i++) {
super annoying that JSONArray implements Iterable, so we can't use standard iteration mechanisms.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 86 at r4 (raw file):
.setCheckDate(date) .setRegistrarId(reportJSON.getString(Spec11Pipeline.REGISTRAR_CLIENT_ID_FIELD)) .setDomainRepoId(getDomainRepoId(domainName, date.toDateTimeAtCurrentTime()))
Calling toDateTimeAtCurrentTime
seems a bit variable to me. We don't know when the script was run during the day before so it seems better to just use the method that gets it at the start of the day -- it's not going to be right either, but if it's impossible to be right we should be consistent.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 97 at r4 (raw file):
return LocalDate.parse(matcher.group(0), ISODateTimeFormat.date()); }
A thought I had: what happens if we run this script multiple times? We'd get get duplicate entries, right? We might want to give some thought as to how to avoid this.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 106 at r4 (raw file):
} private GcsFilename getGcsFilename(String reportingBucket, LocalDate localDate) {
this method can be inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r4.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 54 at r1 (raw file):
Previously, sarahcaseybot wrote…
Does setting this to just new YearMonth() set it to the current month? If so, can you please add a comment that indicates that?
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 48 at r4 (raw file):
Previously, gbrodman wrote…
We'll also want to override the
prompt
method -- this should print out a description of the actions that will be done in the execute() method in some way (perhaps just printing out the number of files we'd parse or something quick like that)
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 60 at r4 (raw file):
Previously, gbrodman wrote…
possible nit: does this fit all on one line?
I moved this into prompt()
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 62 at r4 (raw file):
Previously, gbrodman wrote…
Instead of iterating over the key set, a common way of iterating through maps is the following:
filenamesToDates.entrySet().stream().map(entry -> getSpec11ThreatMatchesFromFile(entry.getKey(), entry.getValue())) .forEach(threatMatches -> jpaTm().transact(() -> jpaTm().saveNewOrUpdateAll(threatMatches)));
basically streaming the entry set, then mapping each <key, value> entry into the corresponding list, then calling persist on each of the lists
Could not stream because of the unhandled IOException
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 68 at r4 (raw file):
Previously, gbrodman wrote…
Generally you want to return a string that describes a summary of what was done -- successes, failures, etc
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 86 at r4 (raw file):
Previously, gbrodman wrote…
Calling
toDateTimeAtCurrentTime
seems a bit variable to me. We don't know when the script was run during the day before so it seems better to just use the method that gets it at the start of the day -- it's not going to be right either, but if it's impossible to be right we should be consistent.
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 106 at r4 (raw file):
Previously, gbrodman wrote…
this method can be inlined
Done.
b3fc3dd
to
9762a61
Compare
This pull request introduces 1 alert when merging fd59a8d into b2a78b5 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 2 files reviewed, 10 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 40 at r6 (raw file):
import java.util.regex.Pattern; import javax.inject.Inject; import jdk.nashorn.internal.ir.annotations.Immutable;
what's up with this import? I don't think it's supposed to be here
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 53 at r6 (raw file):
private static final YearMonth START_MONTH = new YearMonth(2019, 01); private static final YearMonth END_MONTH = new YearMonth(); // Constructs a YearMonth with the current year and month
nit: we can fit all this on one line if we're more concise about the comment, e.g. YearMonth() defaults to current month
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 58 at r6 (raw file):
private static final Pattern FILENAME_PATTERN = Pattern.compile("SPEC11_MONTHLY_REPORT_(\\d{4}-\\d{2}-\\d{2})"); private static final String REPORTING_FOLDER = "domain-registry-reporting/icann/spec11/";
Technically we cannot rely on the bucket being domain-registry-reporting
. See how we inject the argument in Spec11RegistrarThreatMatchesParser (and indeed we could potentially refactor this + that class to use the same code for file path creation)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 59 at r6 (raw file):
Pattern.compile("SPEC11_MONTHLY_REPORT_(\\d{4}-\\d{2}-\\d{2})"); private static final String REPORTING_FOLDER = "domain-registry-reporting/icann/spec11/"; private static ImmutableMap<GcsFilename, LocalDate> filenamesToDates;
This shouldn't be static; it's specific to one run of the command
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 66 at r6 (raw file):
protected String prompt() throws IOException { filenamesToDates = mapFilenamesToDates(START_MONTH, END_MONTH); return String.format("Parsing through %d files.", numFiles);
it's a code smell to be storing an unnecessary variable numFiles
-- let's just use the size of the filenamesToDates map in some way
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 87 at r6 (raw file):
} else { return String.format( "Successfully parsed through %d files. We failed to parse through the following files: %s",
much better -- a suggestion could be that if we have a ton of files this might look a bit messy, so perhaps we'd want to print them on multiple lines (so joining on \n
, perhaps using Joiner.on('\n').join(someCollection)
Fix small typo Fix a second small typo Add license header Add license header
03b11d9
to
caba9a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 9 unresolved discussions (waiting on @gbrodman and @sarahcaseybot)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 97 at r4 (raw file):
Previously, gbrodman wrote…
A thought I had: what happens if we run this script multiple times? We'd get get duplicate entries, right? We might want to give some thought as to how to avoid this.
Addressing this with Spec11ThreatMatchDao
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 40 at r6 (raw file):
Previously, gbrodman wrote…
what's up with this import? I don't think it's supposed to be here
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 58 at r6 (raw file):
Previously, gbrodman wrote…
Technically we cannot rely on the bucket being
domain-registry-reporting
. See how we inject the argument in Spec11RegistrarThreatMatchesParser (and indeed we could potentially refactor this + that class to use the same code for file path creation)
Done. (I also moved the injects above the other variables to be able to reference the reporting bucket when creating the filepath)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 59 at r6 (raw file):
Previously, gbrodman wrote…
This shouldn't be static; it's specific to one run of the command
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 66 at r6 (raw file):
Previously, gbrodman wrote…
it's a code smell to be storing an unnecessary variable
numFiles
-- let's just use the size of the filenamesToDates map in some way
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 87 at r6 (raw file):
Previously, gbrodman wrote…
much better -- a suggestion could be that if we have a ton of files this might look a bit messy, so perhaps we'd want to print them on multiple lines (so joining on
\n
, perhaps usingJoiner.on('\n').join(someCollection)
Done.
This pull request introduces 1 alert when merging caba9a3 into 26dfdd5 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 6 unresolved discussions (waiting on @gbrodman, @leginachen, and @sarahcaseybot)
core/src/main/java/google/registry/tools/RegistryTool.java, line 36 at r8 (raw file):
.put("ack_poll_messages", AckPollMessagesCommand.class) .put("backfill_registry_locks", BackfillRegistryLocksCommand.class) .put("backfill_spec11threatmatch", BackfillSpec11ThreatMatchCommand.class)
i'd say backfill_spec11_threat_matches
, analogous to the backfill registry locks command
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 97 at r4 (raw file):
Previously, leginachen (Legina Chen) wrote…
Addressing this with Spec11ThreatMatchDao
Resolving for now, but we'll probably want to wait until that's merged in to finalize this.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 85 at r8 (raw file):
spec11ReportFilename, filenamesToDates.get(spec11ReportFilename)); jpaTm.transact(() -> jpaTm.saveNewOrUpdateAll(threatMatches)); } catch (IOException e) {
Let's be a bit more generic with the exceptions we catch here, and we don't want to swallow the exception entirely. Let's use a logger, catch generic Exception
s, log the exception, and then add it to the failedFilesBuilder
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 137 at r8 (raw file):
ImmutableList.Builder<Spec11ThreatMatch> threatMatches = ImmutableList.builder(); try (InputStream in = gcsUtils.openInputStream(spec11ReportFilename)) { InputStreamReader streamReader = new InputStreamReader(in, UTF_8);
You can (and should) have multiple resources in the try-with-resources clause.
that should hopefully help
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 170 at r8 (raw file):
mappedFilenamesToDates.put( new GcsFilename( "domain-registry-reporting", Spec11Pipeline.getSpec11ReportFilePath(fileDate)),
pretty sure this should be taken from the injected bucket
but if we already have the filename, I don't think we should need to call Spec11Pipeline.getSpec11ReportFilePath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 6 unresolved discussions (waiting on @gbrodman and @sarahcaseybot)
core/src/main/java/google/registry/tools/RegistryTool.java, line 36 at r8 (raw file):
Previously, gbrodman wrote…
i'd say
backfill_spec11_threat_matches
, analogous to the backfill registry locks command
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 85 at r8 (raw file):
Previously, gbrodman wrote…
Let's be a bit more generic with the exceptions we catch here, and we don't want to swallow the exception entirely. Let's use a logger, catch generic
Exception
s, log the exception, and then add it to the failedFilesBuilder
Done.
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 137 at r8 (raw file):
Previously, gbrodman wrote…
You can (and should) have multiple resources in the try-with-resources clause.
that should hopefully help
Didn't know about this, thanks! Let me know if that works
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 170 at r8 (raw file):
Previously, gbrodman wrote…
pretty sure this should be taken from the injected bucket
but if we already have the filename, I don't think we should need to call
Spec11Pipeline.getSpec11ReportFilePath
?
Good point! Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like one of your tests is failing presubmit checks because it is checking for the wrong filename. Please fix.
Reviewed 1 of 2 files at r7.
Reviewable status: 2 of 3 files reviewed, 5 unresolved discussions (waiting on @gbrodman)
core/src/main/java/google/registry/tools/RegistryTool.java, line 36 at r8 (raw file):
Previously, leginachen (Legina Chen) wrote…
Done.
I don' think this is done
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 85 at r8 (raw file):
Previously, leginachen (Legina Chen) wrote…
Done.
I don't think you did this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 5 unresolved discussions (waiting on @gbrodman and @sarahcaseybot)
core/src/main/java/google/registry/tools/RegistryTool.java, line 36 at r8 (raw file):
Previously, sarahcaseybot wrote…
I don' think this is done
My bad, didn't git add this file
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 85 at r8 (raw file):
Previously, sarahcaseybot wrote…
I don't think you did this
this was done in my last commit (the code was pushed after the comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r9.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @gbrodman and @leginachen)
core/src/main/java/google/registry/tools/javascrap/BackfillSpec11ThreatMatchCommand.java, line 149 at r9 (raw file):
threatMatches.addAll(createSpec11ThreatMatches(reportLines.get(i), date)); } Spec11ThreatMatchDao.deleteEntriesByDate(jpaTm, date);
This seems like an odd place to put this line of code. Let's put it closer to where we're iterating by date, conceptually
for date in dates:
open transaction
delete entries from this day
get + save new entries from this day
close transaction
that way each day is done transactionally, and you'll never half-finish one
Closing in favor of #897 |
Previously, the Spec11Pipeline stored ThreatMatches in JSON files in a GCS folder. The new Spec11ThreatMatches database is up and running now, and this command is used to backfill the table with the historical data stored in the JSON files (starting from January 31st, 2019).
This change is