Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

are.na filtering and validation lib #7

Open
g-a-v-i-n opened this issue Feb 21, 2018 · 6 comments
Open

are.na filtering and validation lib #7

g-a-v-i-n opened this issue Feb 21, 2018 · 6 comments
Assignees
Labels
enhancement New feature or request tool Tool-specific thread.

Comments

@g-a-v-i-n
Copy link

g-a-v-i-n commented Feb 21, 2018

macarena and arentv heavily rely on block type filtering and url validation/sanitization. assuming tools in the toolkit will as well, I'm starting to write a general purpose 'library' to make this less of a headache.

here's an example config:

const config = {
  whitelist: {
    source: ['youtube', 'vimeo', 'upload', 'soundcloud'],
    fileType: ['mp3', 'flac', 'wav'],
    blockType: ['attachment', 'media'],
  },
  validation: {
    internalValidators: {
      isValidHref: true,
      HTTPSonly: false,

    }
    externalValidators: {
      reactPlayer: (item) => reactPlayer.canPlay(item),
      imageIntegrity: (item) => validateImageIntegrity(item),
    }
  },
  sanitization: {
    forceHttps: false, 
    fillTitle: true,
  }
}

the pipeline goes something like this. ideally this is lazy and only runs validation / regex etc on items that cannot be easily rejected according to blockType

get channel contents |> 
does blockType pass? append with message |> 
getURL |>
decide if the URL is valid and sanitize, append with message|>
any other external validators (reactPlayer), append with message |>
superficial sanitization (fill untitled blocks etc |> 
return copy of contents with messages

i propose it is appropriate to append messages about this process to the block itself in the following fashion:

{
  ...contents,
  validation: {
    source: {
      pass: true,
      isOfType: 'YOUTUBE',
      message: '',
    },
    fileType: {
      pass: false,
      isOfType: 'NO_FILE'
      message: '',
    },
    blocktype: {
      pass: true,
      isOfType: 'MEDIA'
      message: '',
    },
// external validators ( idk ? )
    reactPlayer: {
      pass: true,
      isOfType: 'CANPLAY'
      message: '',
    }
  }
}
@g-a-v-i-n
Copy link
Author

The final Q is – should this be a node module?

@hxrts
Copy link
Member

hxrts commented Feb 21, 2018

So great. Within the Toolkit, I was thinking modules like this could be loaded once, and then provide a global interface for other tools. As a node module, this model would imply one import, rather than a separate import for each tool. I'm open to other strategies, but I think this will reduce page weight and complexity in the long run.

@hxrts
Copy link
Member

hxrts commented Feb 21, 2018

One piece of related functionality that's also worth mentioning is a "block fetch" module. Once block data has been validated and the block type determined, it would be quite useful to retrieve specific data from those objects.

Examples

mp3:  parse ID3 metadata & return non-empty fields
jpeg: parse embedded metadata + determine file size
http: navigate to url, return <title>, attempt return of body text
pdf:  title / author / number of pages

Presumably this would be a separate module, unless you're trying to make a swiss army knife.

@g-a-v-i-n
Copy link
Author

single load sounds good to me. each validator should just be plug and play, so each tool can provide it's own custom validation methods if necessary.

and yes, i think this is where the tinyAPI parser from mac.are.na would be a good fit

@g-a-v-i-n
Copy link
Author

https://github.com/gavinpatkinson/validate-arena
heres a first pass

@hxrts hxrts added tool Tool-specific thread. enhancement New feature or request labels Feb 26, 2018
@g-a-v-i-n
Copy link
Author

g-a-v-i-n commented Feb 26, 2018

ok after building /prototyping this a little i have a thought about how general this can/should/wants to be:

rn we have a config as follows. Note the perscripted block attributes.

const validatorConfig = {
    whitelists: {
      class: ['Attachement', 'Media'],
      providerName: ['YouTube', 'Vimeo', 'SoundCloud'],
      extension: ['mp3', 'flac', 'wav'],
      state: ['available'],
    },
    sanitizers: {
      cleanURL: block => cleanURL(block),
      fillTitle: block => fillTitle(block.title),
    },
    validators: {
      reactPlayerValidator: block => reactPlayerValidator(block),
    },
}

BUT what we could do instead is:

const validatorConfig = {
    whitelists: {
      block.class: ['Attachement', 'Media'],
      block.source.provider.name: ['YouTube', 'Vimeo', 'SoundCloud'],
      block.attachment.extension: ['mp3', 'flac', 'wav'],
      block.state: ['available'],
      any.sharedKey: ['something'],
      channel.whatever.something.heck: ['yadda'],
    },
    sanitizers: {
      cleanURL: block => cleanURL(block),
      fillTitle: block => fillTitle(block.title),
    },
    validators: {
      reactPlayerValidator: block => reactPlayerValidator(block),
    },
}

this way the lib becomes more of a general purpose object validator which is p cool.

The other q is: Should each process - ie whitelist, sanitization, validation be separate methods?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request tool Tool-specific thread.
Projects
None yet
Development

No branches or pull requests

2 participants