Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import script #3

Closed
j3k0 opened this issue Jan 20, 2017 · 9 comments
Closed

Import script #3

j3k0 opened this issue Jan 20, 2017 · 9 comments
Assignees

Comments

@j3k0
Copy link
Owner

j3k0 commented Jan 20, 2017

Implement an import script, with input data organized at described below.

All users are stored in a directory. This directory will contain a lot of JSON files, named after the username (<username>.json). Each JSON file represents a single user.

It contains an object with the following fields (all being of type string):

  • username:
    • the user's unique id
  • email:
    • the user's email
  • givenName:
  • can have values: "Facebook" or "Email"
  • middleName
    • user's app-scoped facebook ID
    • only if givenName equals "Facebook"

We want to import those into ganomede-directory with the following transformation.

Output Field Output Value
user id input.username
password randomPassword()
alias name (public) input.username
alias tag (public) tagizer(input.username)
alias email (private) input.email
alias facebook.id.APP_ID (private) input.middleName
  • APP_ID a constant provided as an env variable or CLI argument (required).
    • (why? because since graph api v2, facebook user's id are app-scoped)
  • randomPassword() is a function that generates a safe random password.
  • tagizer() is a function that generate the unambiguous version of a username. See the ganomede-tagizer micro-library.

Example input data

cat NamilleX07.json

{
  "email": "02.blablah@live.fr",
  "givenName": "Facebook",
  "middleName": "925612319824595",
  "username": "NamilleX07"
}

cat 05lala61.json

{
  "email": "alex.lala@orange.fr",
  "givenName": "Email",
  "middleName": null,
  "surname": "05lala61",
  "username": "05lala61"
}

Corresponding output data

{
    "id": "NamilleX07",
    "aliases": [
        [1481436006, "email", "02.blablah@live.fr"],
        [1481436006, "name", "NamilleX07", true],
        [1481436006, "tag", "namiiiexo7", true],
        [1481512304, "facebook.id.myapp", "925612319824595"]
    ],
    "hash": "long-crypto-level-string-encoding-the-password"
}
{
    "id": "05lala61",
    "aliases": [
        [1481436006, "email", "alex.lala@orange.fr"],
        [1481436006, "name", "05lala61", true],
        [1481436006, "tag", "osiaia6i", true]
    ],
    "hash": "long-crypto-level-string-encoding-the-password"
}
@elmigranto
Copy link
Collaborator

elmigranto commented Feb 2, 2017

Couple of questions:

  1. Is this to copy stuff from old format to directory format?
  2. Perhaps it would be better to convert this to payload for user creation endpoint? That being said, desired output looks different from both, what we store in couch and what that endpoint wants to see.
  3. I assume we set date on alias document to new Date() (current time)?
  4. I assume we ignore middleName if it isn't Facebook, since email is already there?

@elmigranto elmigranto mentioned this issue Feb 2, 2017
4 tasks
@j3k0 j3k0 added in progress and removed ready labels Feb 2, 2017
@j3k0
Copy link
Owner Author

j3k0 commented Feb 2, 2017

Is this to copy stuff from old format to directory format?

Yes. We're migrating away from Stormpath. I've built a proof-of-concept stormpath export script that'll get us data in the described format.

We'll get a big-ass folder with 600,000 users in it. It'll then be split into small chunks by a shell script.

Perhaps it would be better to convert this to payload for user creation endpoint?

Yes, would be great. That keeps things flexible. We need to know which user-import failed though. There might be 2 users that have the same "tag" for instance: expecting that to be rare, we'll resolve those issues manually.

That being said, desired output looks different from both, what we store in couch and what that endpoint wants to see. I assume we set date on alias document to new Date() (current time)?

Yup. Desired "output" is just an illustration. I think I took the format of the GET /users request, doesn't really matter as long as it's clear to you!

I assume we ignore middleName if it isn't Facebook, since email is already there?

Yes, there might be extra fields in the input that have to be ignored as well.

@elmigranto
Copy link
Collaborator

elmigranto commented Feb 2, 2017

All right, let's have a format as payload for user creation (id, password, aliases) and script can either:

  1. do everything itself (convert, send, print results);
  2. or output stuff for curl to send to:
    1. to stdout
    2. to file in same folder named ${inputFilename}.out.

Not sure which from 2. would be easier to wrap in xargs / curl / whatever. Or maybe we want to just give it list of filenames / directory, it'll do all the things and stdout JSON with successes and errors.

@elmigranto
Copy link
Collaborator

About generating passwords — do we care about what they are, or we expect user to change it later?

@j3k0
Copy link
Owner Author

j3k0 commented Feb 2, 2017

I suppose we can pipeline with multiple smaller tools written with whatever is more convenient.

  • converter: read "old format" from a file, stdout the payload
    • probably easier with node
  • publisher: read payload from file, stdout success/error
    • probably a simple bash+curl script

then some bash glue around that.

@j3k0
Copy link
Owner Author

j3k0 commented Feb 2, 2017

Password can be set to random/safe ones... We will mass email the users that they might need to change their password when they connect with a new device.

@elmigranto
Copy link
Collaborator

My thinking is that starting node process for 600k files is a bit wasteful (takes 50ms just to start, do nothing and exit; about the same as read file, parse json, serialize and print it out):

time node -e ''
node -e ''  0.04s user 0.01s system 92% cpu 0.052 total

time node -e "console.log('%s', JSON.stringify(require('../import-samples/05lala61.json')))"
{"email":"alex.lala@orange.fr","givenName":"Email","middleName":null,"surname":"05lala61","username":"05lala61"}
node -e   0.07s user 0.02s system 93% cpu 0.090 total

Though, doing it this way is a lot less code. But maybe piping together a bunch of event emitters isn't that much code and I'd rather write js than bash :)

Let's see…

@j3k0
Copy link
Owner Author

j3k0 commented Feb 2, 2017

0.1 seconds per user is ok... That'd mean 16 hours of pre-processing to generate payload. It's acceptable.

@j3k0
Copy link
Owner Author

j3k0 commented Feb 2, 2017

I guess it's so small that you can also have the nodejs conversion script do the sending of a single request, report on stdout.

Then simply find/xargs the whole directory to the script > output.json, shouldn't require a lot of bash. I can help with bash if needed

@j3k0 j3k0 assigned j3k0 and unassigned elmigranto Feb 12, 2017
casualuser added a commit to casualuser/ganomede-directory that referenced this issue Feb 20, 2017
casualuser added a commit to casualuser/ganomede-directory that referenced this issue Feb 22, 2017
@j3k0 j3k0 closed this as completed Mar 9, 2017
@j3k0 j3k0 removed the in progress label Mar 9, 2017
@elmigranto elmigranto mentioned this issue Apr 21, 2017
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants