Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load all datasets into memory #190

Merged
merged 18 commits into from Feb 4, 2019
Merged

load all datasets into memory #190

merged 18 commits into from Feb 4, 2019

Conversation

yachang
Copy link
Contributor

@yachang yachang commented Jan 28, 2019

This change is Reviewable

@yachang yachang requested a review from gfr10598 January 28, 2019 17:47
Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained


manager/manager.go, line 217 at r1 (raw file):

// FetchAnnotator returns the annoator in memory based on filename.
func (am *AnnotatorMap) FetchAnnotator(key string) (api.Annotator, error) {

Please just modify GetAnnotator instead of introducing a new function.

@coveralls
Copy link

coveralls commented Jan 29, 2019

Pull Request Test Coverage Report for Build 1836

  • 64 of 95 (67.37%) changed or added relevant lines in 3 files are covered.
  • 81 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-2.6%) to 47.386%

Changes Missing Coverage Covered Lines Changed/Added Lines %
manager/manager.go 12 20 60.0%
geoloader/filename.go 49 58 84.48%
handler/handler.go 3 17 17.65%
Files with Coverage Reduction New Missed Lines %
handler/handler.go 8 53.85%
manager/manager.go 73 12.82%
Totals Coverage Status
Change from base Build 1790: -2.6%
Covered Lines: 725
Relevant Lines: 1530

💛 - Coveralls

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: after changes.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 143 at r2 (raw file):

		if fileDate.Before(GeoLite2StartDate) {
			// temporary hack to avoid legacy
			if !GeoLegacyRegex.MatchString(file.Name) && !GeoLegacyv6Regex.MatchString(file.Name) {

Not clear why this is necessary. Add comment?

Copy link
Contributor Author

@yachang yachang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 143 at r2 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Not clear why this is necessary. Add comment?

Done.


manager/manager.go, line 217 at r1 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Please just modify GetAnnotator instead of introducing a new function.

Done.

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


manager/manager.go, line 227 at r3 (raw file):

// GetAnnotator returns the correct annotator to use for a given timestamp.
func GetAnnotator(request *api.RequestData) (api.Annotator, error) {

I think this is a regression. I changed all the APIs to use the date, instead of a request. Why change it back?

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 35 at r3 (raw file):

// To speed up online matching, we build another directory for IPv6 as well.
var datasetDirV6 = &directory{}
var datasetDirLockV6 sync.RWMutex

Might be better to use a single lock to protect both, and update them together.


geoloader/filename.go, line 65 at r3 (raw file):

type dateEntry struct {
	date      time.Time
	filenames []string

I guess you only need a single filename now.
I'm also thinking it would be better to store the GeoIP directly here as well. That is a bigger refactoring, so maybe better to do it in a separate PR.


geoloader/filename.go, line 136 at r3 (raw file):

func UpdateArchivedFilenames() error {
	old := getDirectoryV4()
	size := len(old.dates) + 2

Should use the max of v4 and v6 here, or use v4 to size v4, and v6 to size v6.


geoloader/filename.go, line 170 at r3 (raw file):

		}

		if !fileDate.Before(GeoLite2StartDate) && !GeoLite2Regex.MatchString(file.Name) {

This sequence of if statements is becoming hard to reason about.


geoloader/filename.go, line 183 at r3 (raw file):

			dirV6.Insert(fileDate, file.Name)
		}
		DatasetFilenames = append(DatasetFilenames, file.Name)

This will do the wrong thing when Update... is called again. Put some more thought into this.

Copy link
Contributor Author

@yachang yachang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 35 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Might be better to use a single lock to protect both, and update them together.

Done.


geoloader/filename.go, line 65 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

I guess you only need a single filename now.
I'm also thinking it would be better to store the GeoIP directly here as well. That is a bigger refactoring, so maybe better to do it in a separate PR.

Done.


geoloader/filename.go, line 136 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Should use the max of v4 and v6 here, or use v4 to size v4, and v6 to size v6.

Done.

Copy link
Contributor Author

@yachang yachang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 170 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

This sequence of if statements is becoming hard to reason about.

This is to discard all dataset that is after cutoff date, but not Geolite2, since we download legacy datasets as well after cutoff date.

Copy link
Contributor Author

@yachang yachang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 183 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

This will do the wrong thing when Update... is called again. Put some more thought into this.

Thanks!

Copy link
Contributor Author

@yachang yachang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


manager/manager.go, line 227 at r3 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

I think this is a regression. I changed all the APIs to use the date, instead of a request. Why change it back?

because we need info of IPv4 or IPv6 to select an annotator now unfortunately :(

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 135 at r4 (raw file):

		return err
	}
	DatasetFilenames = []string{}

I think you need to protect DatasetFilenames with the mutex, too.
Suggest you make it local, and access it with an exported access function.

Probably should then create all the slices, then take the lock and replace them all at once.

Copy link
Contributor

@gfr10598 gfr10598 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 135 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

I think you need to protect DatasetFilenames with the mutex, too.
Suggest you make it local, and access it with an exported access function.

Probably should then create all the slices, then take the lock and replace them all at once.

BTW, it would be nice to have a unit test that shows this race condition. Let me know if you want help creating one.

Copy link
Contributor Author

@yachang yachang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 LGTMs obtained


geoloader/filename.go, line 135 at r4 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

BTW, it would be nice to have a unit test that shows this race condition. Let me know if you want help creating one.

Done.

@yachang yachang merged commit fee921e into master Feb 4, 2019
@stephen-soltesz stephen-soltesz deleted the load-all branch August 12, 2022 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants