Skip to content

Commit

Permalink
updated blog (#1013)
Browse files Browse the repository at this point in the history
  • Loading branch information
evisdrenova committed Jan 2, 2024
1 parent f179496 commit e34d9ad
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions marketing/content/blog/introducing-neosync.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ With this context, we started to think about the intersection of these two probl
Today, developers and ML engineers don't have a way to generate high-quality synthetic data and sync it across environments. Whether it's to build and test applications or train and test models, the problem is the same. Neosync aims to solve this problem.

Neosync does two main things. First, it can generate high-quality synthetic data from scratch or anonymize existing data. This is extremely useful for developers who want to build and test their applications locally or in CI. Especially, if they're working with sensitive data. The status quo is that a developer will either manually <InlineBlogCode children='PGDUMP' /> from their production database and <InlineBlogCode children='PGRESTORE' /> locally or run a script that does the same thing. It's what almost every developer we've talked to does. That obviously needs to change. Not only is it ridiculously insecure but it's also wildly inefficient.
Almost no one we talked to had infrastructure around this workflow. How do you deal with versioning, access control, security, mismatches between local and CI? The list goes on. Neosync solves these problems.
Almost no one we talked to had infrastructure around this workflow. How do you deal with versioning, access control, security, mismatches between local and CI? The list goes on. Neosync aims to solve these problems.

For ML engineers, they need access to high quality synthetic data to train models. Neosync has the notion of a **transformer**. These are flexible modules that you can use to generate pretty much any kind of data. We're starting with the base data types i.e <InlineBlogCode children='float64, int64, strings' /> etc and adding more defined data types such as <InlineBlogCode children='first_name, last_name, address, ssn' /> and many more. We're shipping with 40+ transformers and are constantly adding more. If you're ever used the faker library then we're basically on par with that out of the gate. In the future, we're going to be releasing models that allow you to define the data you want and let the model automatically generate it. Lot's more to come on this.
For ML engineers, they need access to high quality synthetic data to train and fine tune models. Neosync has the notion of a **transformer**. These are flexible modules that you can use to generate pretty much any kind of data. We're starting with the base data types i.e <InlineBlogCode children='float64, int64, strings' /> etc and adding more defined data types such as <InlineBlogCode children='first_name, last_name, address, ssn' /> and many more. We're shipping with 40+ transformers and are constantly adding more. If you're ever used the faker library then we're basically on par with that out of the gate. In the future, we're going to be releasing models that allow you to define the data you want and let the model automatically generate it. Lot's more to come on this.

The second thing Neosync does is that it syncs data across environments. If you're anonymizing data from prod and want to use that locally, you need to first subset the data, because you likely only need a portion of it and then secondly sync it across stage, CI and dev. Neosync natively handles all of this with a workflow orchestration framework powered by (Temporal)(temporal.io). You can even sync it directly to a local DB using the Neosync CLI. For ML engineers, this means syncing data across environments from an S3 bucket or data lake. We made it dead simple and put it on a schedule.
The second thing Neosync does is that it syncs data across environments. If you're anonymizing data from prod and want to use that locally, you need to first subset the data, because you likely only need a portion of it and then secondly sync it across stage, CI and dev. Neosync natively handles all of this with a workflow orchestration framework powered by [Temporal](temporal.io). You can even sync it directly to a local DB using the Neosync CLI. For ML engineers, this means syncing data across environments from an S3 bucket or data lake. We made it dead simple and put it on a schedule.

# Why now?

Expand Down

1 comment on commit e34d9ad

@vercel
Copy link

@vercel vercel bot commented on e34d9ad Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.