-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade dependency versions #5
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- This requires renaming OneHotEncoderEstimator -> OneHotEncoder - Some tests are failing now
- I also updated the Scala dependency versions to keep them up to date
- This is a temporary fix that we should come back to check on later
- This doesn't affect anything right now, since we're not using Scala in the Dockerfile yet.
- When we upgraded from Apache Commons Text 1.4 to 1.9, we got a couple of Jaro-Winkler bugfixes which slightly changed the similarity scores returned here.
- These tests seem to have changed due to a change in randomSplit()'s internals from Spark v2 to Spark v3. So some of the training results are different, but similar.
- These tests are very finnicky and I think that they were failing for the same reason as the previous ones -- differences in how training data was split.
- This required an upgrade to Spark 3, so now we can take care of it
- This introduced an error that I had some trouble tracking down. The PackageLoader couldn't load some of the packages because they didn't have a templates/ subdirectory. So I've added some empty templates/ directories to fix the issue.
- This is a dependency of pytest, so let's just let pytest install it
- No other changes seem to be needed
- The biggest change here is that black now cares about whitespace in single-line doc comments.
- This will keep us out of trouble when the deprecated JaroWinklerDistance changes in Scala Commons Text 2.0: https://issues.apache.org/jira/browse/TEXT-104 - I also removed some imports which were causing compiler warnings
- This fixes a bug I introduced in commit 1539079
- Some pandas type-inference code seems to have changed, so we need to change these queries to use integers instead of strings
- I haven't been able to come up with a surefire explanation for why this has changed. My theory is that something internal changed in the RandomForestClassifier with the upgrade from Java 8 to Java 11. The new value seems reasonable, and I don't think that this will cause any issues.
jacwellington
pushed a commit
that referenced
this pull request
May 23, 2022
Upgrade dependency versions
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR upgrades many of hlink's dependencies to newer versions. This should make it much easier to maintain and keep it from getting stuck using old, broken versions of different packages.
PackageLoader
works. I added some emptytemplates/
subdirectories in link task packages, which fixed the issue.JaroWinklerSimilarity
instead ofJaroWinklerDistance
in the Scala code. This should save us a headache when we next go to upgrade Scala Commons Text, and the two classes have the exact same logic right now.pandas.DataFrame.append()
topandas.concat()
, following deprecation messages.