New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databased Backed Blacklists #2396

Merged
merged 16 commits into from Sep 17, 2017

Conversation

Projects
None yet
4 participants
@dstufft
Member

dstufft commented Sep 16, 2017

This removes the blacklist that was hardcoded, and instead replaces it with one that is stored in the database. It also provides an admin UI to view and add new blacklists.

This is a somewhat of a dangerous workflow, because blacklisting a name also involves removing all files, releases, etc that exist for that (essentially the same thing as hitting remove package on legacy PyPI). Due to the danger, adding a blacklist requires confirmation.

This still requires a few things:

  • Add the ability to remove existing blacklists to allow users to claim the name again.
  • Make the confirmation page harder to blindly click though.
    • This might be requiring typing the name of the project that would be deleted into an input box?
    • This might require typing in a random string?
  • Write tests

Throwing this up here so people can take a look at it before I finish it.

@dstufft dstufft requested a review from ewdurbin Sep 16, 2017

@dstufft dstufft changed the title from [WIP] Databased Backed Blacklists to Databased Backed Blacklists Sep 16, 2017

dstufft added some commits Sep 16, 2017

request.db.add(
JournalEntry(
name=project.name,
action="remove",

This comment has been minimized.

@ewdurbin

ewdurbin Sep 16, 2017

Member

should the action here be blacklisted?

@ewdurbin

ewdurbin Sep 16, 2017

Member

should the action here be blacklisted?

This comment has been minimized.

@dstufft

dstufft Sep 17, 2017

Member

I'm going to leave this alone, because other things might be reading our journal entries and inferring things from them, and we want them to still go ahead and remove these projects if they are. It's not really super important to keep in the log that a project was blacklisted either, since it's a binary state and admins can see when it was and who did it (as well as any relevant comments).

@dstufft

dstufft Sep 17, 2017

Member

I'm going to leave this alone, because other things might be reading our journal entries and inferring things from them, and we want them to still go ahead and remove these projects if they are. It's not really super important to keep in the log that a project was blacklisted either, since it's a binary state and admins can see when it was and who did it (as well as any relevant comments).

@ewdurbin

No blockers. I would like for us to work on a method of bulk blacklisting at some point.

In recent reports, there have been dozens of package names affected that would be somewhat cumbersome to blacklist one-by-one

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Sep 17, 2017

Member

Feedback handled. I'm a little nervous about the idea of a bulk blacklisting, since blacklisting means deleting packages and is generally irreversible, perhaps once we get some more experience with the single entrypoint we can explore that?

Member

dstufft commented Sep 17, 2017

Feedback handled. I'm a little nervous about the idea of a bulk blacklisting, since blacklisting means deleting packages and is generally irreversible, perhaps once we get some more experience with the single entrypoint we can explore that?

dstufft added some commits Sep 17, 2017

fix

@dstufft dstufft merged commit 52b2471 into pypa:master Sep 17, 2017

2 checks passed

codecov/project 100% (target 100%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dstufft dstufft deleted the dstufft:db-backed-blacklist branch Sep 17, 2017

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Sep 17, 2017

Member

As far as a bulk feature goes, that may be more appropriate to tackle as a "bulk quarantine" (i.e. shuffle all the affected artifacts off to a different S3 bucket, without deleting them outright) rather than as bulk automation of the existing removal behaviour (quarantine also has the benefit of preserving the artifacts for security analysis, similar to the way anti-virus quarantining works).

Member

ncoghlan commented Sep 17, 2017

As far as a bulk feature goes, that may be more appropriate to tackle as a "bulk quarantine" (i.e. shuffle all the affected artifacts off to a different S3 bucket, without deleting them outright) rather than as bulk automation of the existing removal behaviour (quarantine also has the benefit of preserving the artifacts for security analysis, similar to the way anti-virus quarantining works).

@dstufft

This comment has been minimized.

Show comment
Hide comment
@dstufft

dstufft Sep 17, 2017

Member

We don't actually delete the files from S3 even with bulk delete. The cost of storing the files is minuscule it's actually serving the files that actually costs the bulk of our S3 "bill". This also makes it easier for us to deal with transactionally deleting these projects since it's all contained inside of the database.

So deleting here is really just deleting the data from the database (although maybe at some point we'll have some process to garbage collect data from S3 that is no longer referenced in the DB).

Member

dstufft commented Sep 17, 2017

We don't actually delete the files from S3 even with bulk delete. The cost of storing the files is minuscule it's actually serving the files that actually costs the bulk of our S3 "bill". This also makes it easier for us to deal with transactionally deleting these projects since it's all contained inside of the database.

So deleting here is really just deleting the data from the database (although maybe at some point we'll have some process to garbage collect data from S3 that is no longer referenced in the DB).

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Sep 17, 2017

Member

OK, so even today the database entries for "deleted" releases could potentially be backfilled from the artifacts retained in S3. I think that would already mitigate a lot of the risk associated with removal errors, even if the scripts for state reconstruction don't exist yet.

Member

ncoghlan commented Sep 17, 2017

OK, so even today the database entries for "deleted" releases could potentially be backfilled from the artifacts retained in S3. I think that would already mitigate a lot of the risk associated with removal errors, even if the scripts for state reconstruction don't exist yet.

func.normalize_pep426_name(form.name.data))).scalar():
raise _exc_with_message(
HTTPBadRequest,
"The name {!r} is not allowed.".format(form.name.data),

This comment has been minimized.

@benjaoming

benjaoming Sep 17, 2017

Hi @dstufft - glad you started something!

Would suggest the message to be a little more clear?

For concerns of security (namesquatting and typosquatting), '{!r}' is not an allowed name for a project. Please use a different name.
@benjaoming

benjaoming Sep 17, 2017

Hi @dstufft - glad you started something!

Would suggest the message to be a little more clear?

For concerns of security (namesquatting and typosquatting), '{!r}' is not an allowed name for a project. Please use a different name.

This comment has been minimized.

@dstufft

dstufft Sep 17, 2017

Member

I don't want to get into specifics like that because there are a lot of reasons a name might be prohibited, not just for security purposes. If we want to provide details about why a name is blocked, we'll want to either expose the comment field or (more likely) provide categories to the blacklist.

@dstufft

dstufft Sep 17, 2017

Member

I don't want to get into specifics like that because there are a lot of reasons a name might be prohibited, not just for security purposes. If we want to provide details about why a name is blocked, we'll want to either expose the comment field or (more likely) provide categories to the blacklist.

This comment has been minimized.

@ncoghlan

ncoghlan Sep 18, 2017

Member

I added that question to #2401 (comment) (as I think it's related to the general UX of how we communicate invalid name declarations to end users)

@ncoghlan

ncoghlan Sep 18, 2017

Member

I added that question to #2401 (comment) (as I think it's related to the general UX of how we communicate invalid name declarations to end users)

@benjaoming benjaoming referenced this pull request Sep 17, 2017

Closed

Impact on PyPI Search #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment