-
Notifications
You must be signed in to change notification settings - Fork 75
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
ab9ddba
commit c1d62b6
Showing
17 changed files
with
165 additions
and
193 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
title: Core Features | ||
id: core-features | ||
hide_title: false | ||
slug: /core-features | ||
--- | ||
|
||
## Introduction | ||
|
||
Neosync ships with a number of features but there are four core features that drive Neosync. | ||
|
||
## Anonymization | ||
|
||
Neosync provides the core anonymization functionality through [transformers](/platform#transformers). Transformers anonymize or mask source data in any way you'd like. Neosync ships with a number of pre-built transformers to help you get started or you can write your own user defined transformer. | ||
|
||
You can use the prebuilt Neosync transformers in order to anonymize your sensitive data or generate new data that looks just like your production data. The Schema page is where you can select, at the column level, how you want to anonymize your data. | ||
|
||
![anon](/img/coreanon.png) | ||
|
||
You have full control over how you configure your schema and can even create your own transformer with your own custom transformation logic if you wish to do so. Neosync is a powerful anonymization engine that can deliver a better developer experience for engineering teams. | ||
|
||
## Synthetic Data Generation | ||
|
||
Synthetic data can be useful for testing applications and services in unsecure development and stage environments where you don't want your sensitive data to be floating around. Neosync helps teams create high-quality synthetic data from their production data that is representative of that production data using our [transformers](/platform#transformers). There are multiple ways to generate high quality synthetic data that can be useful depending on the use-case. | ||
|
||
Neosync can generate synthetic data from scratch, making it easy to test new features that don't already have generated data or when the current production data is to sensitive to work with. We give you different options to be able to generate synthetic data so that it fits your schema and works with your applications. These options are transformer specific and will depend on the data being generated. You can easily seed an entire database with synthetic data using Neosync to get started or create synthetic data for just a given column. | ||
|
||
Neosync also supports integrating with LLM providers such as OpenAI, Anthropic, TogetherAI and more to deliver AI-generated synthetic data. | ||
|
||
![new-trans](/img/llmprompt.png) | ||
|
||
Generating synthetic data is important in order to test services and applications while protecting your sensitive data. Neosync supports many different kinds of synthetic data generation, from full synthetic data generation to partial synthetic data generation across most data types. | ||
|
||
## Subsetting | ||
|
||
Subsetting is useful to reduce the size of a large dataset so that it is usable in another environment with less resources. For example, if you have a large 100gb database, you'll likely want to filter that down to be able to use it locally. | ||
|
||
Neosync can help you subset your data by taking in a SQL statement of how you want to filter your data on a table-by-data. This gives you a flexible way of building your destination data set. Once you've connected Neosync to your source database and configured your schema and mappings, you can then decide to subset that data further by selecting a source table to start with. | ||
|
||
![subset](/img/datasubsetting.png) | ||
|
||
Neosync will automatically ensure relational integrity in the data, making sure that the resulting dataset, post-subset, still has all of the foreign key constraints you had in the original data set. Additionally, Neosync can subset self-referencing tables and circular dependencies, provided there is at least one nullable column within the circular dependency cycle to serve as a viable entry point in your database schema. | ||
|
||
Once you've subsetted the data, Neosync will push the result set to your destination(s). | ||
|
||
Neosync has powerful subsetting features which allow you to create smaller subsets of your data while maintaining relational integrity. This is useful for local and CI testing where you don't want or need the entire dataset but don't want to spend time querying, joining and filtering the data yourself. | ||
|
||
## Orchestration | ||
|
||
At it's core, Neosync is an orchestration platform with anonymization, synthetic data and subsetting capabilities. We rely heavily on [Temporal](https://temporal.com) for our orchestration backbone as it provide us with a lot of power out of the box. | ||
|
||
![anon](/img/orches.png) | ||
|
||
Depending on the type of [Job](../core-concepts#jobs) you create, you can sync data from a source database to one or many destination databases. This is where the orchestration comes into play. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
title: Use cases | ||
id: usecases | ||
hide_title: false | ||
slug: /usecases | ||
--- | ||
|
||
## Introduction | ||
|
||
Neosync is a great way to anonymize sensitive data and sync it across multiple environments for better testing, debugging and developer experience. Typically, teams will point Neosync to a snapshot of their production database and anonymize their production data to make it usable in lower level environments. This is a great way to get production-like data without the risk of security and compliance challenges. | ||
|
||
While there are many use-cases for Neosync, we're going to focus on the main ones that we see from customers. | ||
|
||
## Safely test your code against Production data | ||
|
||
Many developers have experienced the pain of testing their code locally and in staging and it working well and then failing in production either through an edge case or some other bug. We've all said, "It works locally though ..." at one point or other in our careers as developers. A lot of these errors tend to come up because the data that you're using locally to test against isn't representative of production data. Production data is messy and has sharp edges and that type of messiness is really difficult to manufacture in mock data. | ||
|
||
One of the main use cases of Neosync is to anonymize production data and generate synthetic data so that it's usable locally for developers to build and test their code against. This gets them as close to testing in production as they can possibly get without any of the security and privacy risk. Not only is it a much better developer experience it has massive customer benefits. When you're able to test with realistic data you produce more resilient applications that fail less. This directly translates into happier customers and less wasted time trying to fix bugs. | ||
|
||
## Easily reproduce Production bugs locally | ||
|
||
Whenever we come across a bug, the first thing that we want to do is reproduce it locally so we can start to fix it. The problem is that if you don't have great data to work with that closely matches a customer's production environment, you have to hunt for the bug in order to reproduce it. This can waste a lot of time and can result in unhappy customers and frustrated developers. | ||
|
||
The ideal debugging process would to be reproduce the customer's data state locally and then try to execute the same action that the customer took and see if there is a bug. This is where Neosync can come in and help. Neosync can help you anonymize your production and generate synthetic data so that you can use it locally and subset the data but a customer_id or any other SQL query so that you only get that customer's data. This makes the data much easier to work with. | ||
|
||
As a developer, this is the best developer experience you can ask for as well. You're able to see almost exactly what the customer is seeing without any of the security problems and you can quickly understand what is going on. This helps you identify and fix the bug faster and make your customers happier. | ||
|
||
## Fix broken staging environments | ||
|
||
One of the biggest sources of frustration for developers is a broken staging environment. Developers rely on the fidelity of staging environments for access to data as well as the quality of the data. Whether it's hydrating local environments or running staging CI acceptance tests, it's important to have high quality staging data. | ||
|
||
This is where Neosync can come in. Neosync can anonymize and generate synthetic data to populate staging environments with high quality data that gives developers a great developer experience. | ||
|
||
## Reduce your compliance scope | ||
|
||
There are many data privacy regulations such as HIPAA, GDPR, DPDP and more that require that companies in countries or industries with those regulations, protect customer data. For example, a health technology company that is collecting PHI (personal health information) is required to secure that data according to HIPAA regulations. However, this can sometimes be at odds with what developers and engineering teams need. An engineering team needs data to build and test new features however if they use production data, then their development and even local systems can be in scope of HIPAA compliance. That means that they have to protect their development and local environments the same way they would protect their production environment. | ||
|
||
This can place a big burden on security, compliance and engineering teams and isn't the right approach. One of the use-cases that we see for Neosync is to anonymize and generate synthetic data so that you can use it locally and reduce your compliance scope while still having access to high quality data. This not only helps engineering teams but also reduces the compliance and audit scope for security and compliance teams. |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.