Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent creation of duplicit projects #584

Open
sanjayankur31 opened this Issue Aug 22, 2018 · 13 comments

Comments

Projects
None yet
4 participants
@sanjayankur31
Copy link

sanjayankur31 commented Aug 22, 2018

We seem to have 2 entries for "nest". Could one be removed please?

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 22, 2018

This is definitely a bug, we shouldn't allow creating projects with same name and same backend.

@pypingou

This comment has been minimized.

Copy link
Contributor

pypingou commented Aug 22, 2018

The url are different, that's why it was allowed :)

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 22, 2018

I know, the homepage shouldn't be part of primary key.

@pypingou

This comment has been minimized.

Copy link
Contributor

pypingou commented Aug 22, 2018

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 22, 2018

In current version you have two constraints on project name, homepage and name, ecosystem.
The ecosystem is related to language mostly - pypi, maven, npm etc.
If the ecosystem isn't defined, homepage is used.
So if I remove the constraint for name, homepage I still have this issue.
I will remove it anyway, because right now it is redundant.

When I think about it, it's probably not a bug, but we should at least show some warning, that there is project with the same name and same backend.

@Zlopez Zlopez removed the bug label Aug 22, 2018

@sanjayankur31

This comment has been minimized.

Copy link
Author

sanjayankur31 commented Aug 22, 2018

Yeh, maybe a warning that says something on the lines of "We found similarly named projects already in our list: a, b, c. Is this one of them?" would be quite helpful. Thanks :)

@pypingou

This comment has been minimized.

Copy link
Contributor

pypingou commented Aug 22, 2018

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 22, 2018

There is already notification about projects with same name, but it should be more strict.
Maybe asking again if you really want to submit project, when similar name is found.

@jeremycline

This comment has been minimized.

Copy link
Member

jeremycline commented Aug 22, 2018

One thing to help with this would be to modify the URL used as the ecosystem in a predictable way (without changing the version used as the homepage). Right now a lot of the dupes come from http vs https and a trailing slash (or not). If the ecosystem was always HTTPS and included a trailing slash that would catch a lot.

That is, at

new_obj.ecosystem_name = new_obj.homepage
modify the URL a bit

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 22, 2018

Yes, but this is not true for every ecosystem. Some backends aren't using any ecosystem, so the homepage is set as ecosystem.
And I'm not sure if it is good to presume, that every url will be HTTPS.

Maybe it will be better to do the validation for every backend.
Like using the repository_url (not homepage) as primary key and validate it by javascript on frontend.
Something like this:
gitlab - url must be validated as https://gitlab.[hostname.]com/<owner>/<repo>
github - url must be validated as https://github.com/<owner>/<repo>
Than if you already have this project, you will see it almost immediately and you could stop the user from creating duplicate project.

This will probably not work for custom backend, because it is less strict.

I need to check if this is possible for every other backend.

@jeremycline

This comment has been minimized.

Copy link
Member

jeremycline commented Aug 22, 2018

Yes, but this is not true for every ecosystem. Some backends aren't using any ecosystem, so the homepage is set as ecosystem.
And I'm not sure if it is good to presume, that every url will be HTTPS.

Yes, the homepage should not be presumed to be HTTPS, but the URL-is-a-ecosystem is a hack to namespace generic projects that aren't part of a real ecosystem and allow for the uniqueness constraint on (ecosystem, name). The value assigned to the ecosystem is never used as a URL, so some normalization should be safe there.

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 29, 2018

As I understand it, we should use normalized homepage as ecosystem.
Convert every homepage without ecosystem to something like this:
https://<hostname>/<path>/

This means every http will be in ecosystem saved as https and every URL shall have trailing /.

I think this should be enough to have some kind of normalization.

But still the warning for same project name should be shown on submit to prevent addition of same project in different ecosystems. For example Pypi and Github.

Also I noticed, that the name is not normalized, there should be at least check for same name with toLower. I spotted a duplicate projects with only difference in capitalization fwts vs FWTS.
But this could be possibly prevented by the warning.

@Zlopez Zlopez added the enhancement label Aug 29, 2018

@Zlopez Zlopez changed the title Removing duplication projects Prevent creation of duplicit projects Aug 29, 2018

@jeremycline

This comment has been minimized.

Copy link
Member

jeremycline commented Aug 29, 2018

Agreed, I think that will also help

@Zlopez Zlopez added this to the 0.16.0 milestone Feb 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.