Skip to content

Commit

Permalink
updated use-cases section (#1996)
Browse files Browse the repository at this point in the history
  • Loading branch information
evisdrenova committed May 21, 2024
1 parent ab9ddba commit c1d62b6
Show file tree
Hide file tree
Showing 17 changed files with 165 additions and 193 deletions.
2 changes: 1 addition & 1 deletion docs/docs/connections/postgres.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ In order to connect to your Postgres database, first navigate to **Connections**

Then select a Postgres compatible database such as Neon, Supabase or just the base Postgres connection.

![New Connection Page](/img/conn.png)
![connections](/img/connectionsList.png)

You'll now be taken to the Postgres connection form.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/overview/core-concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ to see what jobs executed and their status.{' '}

### Connections

![connections](https://assets.nucleuscloud.com/neosync/docs/connections-page.png)
![connections](/img/connectionsList.png)

Connections are integrations with upstream and/or downstream systems such as Postgres, S3 and Mysql. Jobs use connections to move data across systems. Connections are created outside of jobs so that you can re-use connections across multiple jobs without re-creating it every time.

Expand Down
54 changes: 54 additions & 0 deletions docs/docs/overview/core-features.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: Core Features
id: core-features
hide_title: false
slug: /core-features
---

## Introduction

Neosync ships with a number of features but there are four core features that drive Neosync.

## Anonymization

Neosync provides the core anonymization functionality through [transformers](/platform#transformers). Transformers anonymize or mask source data in any way you'd like. Neosync ships with a number of pre-built transformers to help you get started or you can write your own user defined transformer.

You can use the prebuilt Neosync transformers in order to anonymize your sensitive data or generate new data that looks just like your production data. The Schema page is where you can select, at the column level, how you want to anonymize your data.

![anon](/img/coreanon.png)

You have full control over how you configure your schema and can even create your own transformer with your own custom transformation logic if you wish to do so. Neosync is a powerful anonymization engine that can deliver a better developer experience for engineering teams.

## Synthetic Data Generation

Synthetic data can be useful for testing applications and services in unsecure development and stage environments where you don't want your sensitive data to be floating around. Neosync helps teams create high-quality synthetic data from their production data that is representative of that production data using our [transformers](/platform#transformers). There are multiple ways to generate high quality synthetic data that can be useful depending on the use-case.

Neosync can generate synthetic data from scratch, making it easy to test new features that don't already have generated data or when the current production data is to sensitive to work with. We give you different options to be able to generate synthetic data so that it fits your schema and works with your applications. These options are transformer specific and will depend on the data being generated. You can easily seed an entire database with synthetic data using Neosync to get started or create synthetic data for just a given column.

Neosync also supports integrating with LLM providers such as OpenAI, Anthropic, TogetherAI and more to deliver AI-generated synthetic data.

![new-trans](/img/llmprompt.png)

Generating synthetic data is important in order to test services and applications while protecting your sensitive data. Neosync supports many different kinds of synthetic data generation, from full synthetic data generation to partial synthetic data generation across most data types.

## Subsetting

Subsetting is useful to reduce the size of a large dataset so that it is usable in another environment with less resources. For example, if you have a large 100gb database, you'll likely want to filter that down to be able to use it locally.

Neosync can help you subset your data by taking in a SQL statement of how you want to filter your data on a table-by-data. This gives you a flexible way of building your destination data set. Once you've connected Neosync to your source database and configured your schema and mappings, you can then decide to subset that data further by selecting a source table to start with.

![subset](/img/datasubsetting.png)

Neosync will automatically ensure relational integrity in the data, making sure that the resulting dataset, post-subset, still has all of the foreign key constraints you had in the original data set. Additionally, Neosync can subset self-referencing tables and circular dependencies, provided there is at least one nullable column within the circular dependency cycle to serve as a viable entry point in your database schema.

Once you've subsetted the data, Neosync will push the result set to your destination(s).

Neosync has powerful subsetting features which allow you to create smaller subsets of your data while maintaining relational integrity. This is useful for local and CI testing where you don't want or need the entire dataset but don't want to spend time querying, joining and filtering the data yourself.

## Orchestration

At it's core, Neosync is an orchestration platform with anonymization, synthetic data and subsetting capabilities. We rely heavily on [Temporal](https://temporal.com) for our orchestration backbone as it provide us with a lot of power out of the box.

![anon](/img/orches.png)

Depending on the type of [Job](../core-concepts#jobs) you create, you can sync data from a source database to one or many destination databases. This is where the orchestration comes into play.
98 changes: 49 additions & 49 deletions docs/docs/overview/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,14 @@ import { CustomCardList } from '@site/src/CustomComponents/CustomCardList.tsx';
import { BsFunnel } from 'react-icons/bs';

import {
PiArrowCounterClockwise,
PiArrowsSplitLight,
PiDatabase,
PiFlaskLight,
PiPlugsLight,
PiRocketLaunchLight,
PiShieldCheckLight,
PiTestTube,
} from 'react-icons/pi';

# Welcome to the Neosync Docs
Expand All @@ -34,88 +38,84 @@ testing, debugging and developer experience.

<HeroImage />

## Learn about Neosync
## Core Features

The best way to learn Neosync is to check out our core concepts and familiarize yourself with Jobs, Runs, Connections and Transformers.
Neosync can be used in many different ways to support different use-cases. Check out the most common use-cases below.

<CustomCardList
cards={[
{
title: 'Jobs',
description:
'Jobs are async workflows that transform data and sync it between connections.',
link: 'core-concepts#jobs',
icon: <StackIcon />,
title: 'Anonymization',
description: 'Anonymize sensitive data for safe testing and development',
link: '/core-features#anonymization',
icon: <LinkBreak1Icon />,
},
{
title: 'Runs',
title: 'Synthetic Data Generation',
description:
'Runs are instances of a job that can be started, paused and played back.',
link: 'core-concepts#runs',
icon: <LayersIcon />,
'Generate high-quality synthetic data from existing data or from scratch.',
link: '/core-features#synthetic-data-generation',
icon: <PiFlaskLight />,
},
{
title: 'Connections',
description:
'Connections are sources of data or destinations that you sync using Jobs such as databases.',
link: 'core-concepts#connections',
icon: <PiPlugsLight />,
title: 'Subsetting',
description: 'Subset your data to fit local and stage environments.',
link: '/core-features#subsetting',
icon: <BsFunnel />,
},
{
title: 'Transformers',
description:
'Transformers are data-type specific modules that anonymize or generate data.',
link: 'core-concepts#transformers',
icon: <LightningBoltIcon />,
title: 'Orchestration',
description: 'Orchestrate and sync data across environments',
link: '/core-features#orchestration',
icon: <PiArrowsSplitLight />,
},
]}
/>

## Deploying Neosync

Once you're ready to deploy Neosync, check out our Deployment guide to see the available deployment options.

<CustomCard
title="Deploy"
description="Learn how to deploy Neosync using Docker Compose or Kubernetes. "
link="/deploy/introduction"
icon={<PiRocketLaunchLight />}
/>

## Use cases

Neosync can be used in many different ways to support different use-cases. Check out the most common use-cases below.

<CustomCardList
cards={[
{
title: 'Anonymize Data',
description: 'Anonymize sensitive data for safe testing and development',
link: 'overview/use-cases/anonymization',
icon: <LinkBreak1Icon />,
title: 'Test code against Production',
description: 'Safely test your code against Production data',
link: '/usecases#safely-test-your-code-against-production-data',
icon: <PiTestTube />,
},
{
title: 'Synthetic Data',
description:
'Generate high-quality synthetic data from existing data or from scratch.',
link: 'overview/use-cases/synthetic-data',
icon: <PiFlaskLight />,
title: 'Reproduce bugs locally',
description: 'Easily reproduce Production bugs locally',
link: '/usecases#easily-reproduce-production-bugs-locally',
icon: <PiArrowCounterClockwise />,
},
{
title: 'Subset Data',
description: 'Subset your data to fit local and stage environments.',
link: 'overview/use-cases/subsetting',
icon: <BsFunnel />,
title: 'Fix broken staging environments',
description: 'Fix broken staging environments',
link: '/usecases#fix-broken-staging-environments',
icon: <PiDatabase />,
},
{
title: 'Replicate data',
description: 'Easily replicate source data to multiple environments.',
link: 'overview/use-cases/replication',
icon: <PiArrowsSplitLight />,
title: 'Reduce your compliance scope',
description: 'Frictionless compliance, security and data privacy',
link: '/usecases#reduce-your-compliance-scope',
icon: <PiShieldCheckLight />,
},
]}
/>

## Deploying Neosync

Once you're ready to deploy Neosync, check out our Deployment guide to see the available deployment options.

<CustomCard
title="Deploy"
description="Learn how to deploy Neosync using Docker Compose or Kubernetes. "
link="/deploy/introduction"
icon={<PiRocketLaunchLight />}
/>

## Contributing to Neosync

We love contributors and are happy to accept PRs. The best way to contribute to Neosync is to go ahead and try out it. If you find something is not right, you can report an issue [here](https://github.com/nucleuscloud/neosync/issues/new?assignees=&labels=bug&template=bug_report.md).
File renamed without changes.
38 changes: 38 additions & 0 deletions docs/docs/overview/use-cases.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Use cases
id: usecases
hide_title: false
slug: /usecases
---

## Introduction

Neosync is a great way to anonymize sensitive data and sync it across multiple environments for better testing, debugging and developer experience. Typically, teams will point Neosync to a snapshot of their production database and anonymize their production data to make it usable in lower level environments. This is a great way to get production-like data without the risk of security and compliance challenges.

While there are many use-cases for Neosync, we're going to focus on the main ones that we see from customers.

## Safely test your code against Production data

Many developers have experienced the pain of testing their code locally and in staging and it working well and then failing in production either through an edge case or some other bug. We've all said, "It works locally though ..." at one point or other in our careers as developers. A lot of these errors tend to come up because the data that you're using locally to test against isn't representative of production data. Production data is messy and has sharp edges and that type of messiness is really difficult to manufacture in mock data.

One of the main use cases of Neosync is to anonymize production data and generate synthetic data so that it's usable locally for developers to build and test their code against. This gets them as close to testing in production as they can possibly get without any of the security and privacy risk. Not only is it a much better developer experience it has massive customer benefits. When you're able to test with realistic data you produce more resilient applications that fail less. This directly translates into happier customers and less wasted time trying to fix bugs.

## Easily reproduce Production bugs locally

Whenever we come across a bug, the first thing that we want to do is reproduce it locally so we can start to fix it. The problem is that if you don't have great data to work with that closely matches a customer's production environment, you have to hunt for the bug in order to reproduce it. This can waste a lot of time and can result in unhappy customers and frustrated developers.

The ideal debugging process would to be reproduce the customer's data state locally and then try to execute the same action that the customer took and see if there is a bug. This is where Neosync can come in and help. Neosync can help you anonymize your production and generate synthetic data so that you can use it locally and subset the data but a customer_id or any other SQL query so that you only get that customer's data. This makes the data much easier to work with.

As a developer, this is the best developer experience you can ask for as well. You're able to see almost exactly what the customer is seeing without any of the security problems and you can quickly understand what is going on. This helps you identify and fix the bug faster and make your customers happier.

## Fix broken staging environments

One of the biggest sources of frustration for developers is a broken staging environment. Developers rely on the fidelity of staging environments for access to data as well as the quality of the data. Whether it's hydrating local environments or running staging CI acceptance tests, it's important to have high quality staging data.

This is where Neosync can come in. Neosync can anonymize and generate synthetic data to populate staging environments with high quality data that gives developers a great developer experience.

## Reduce your compliance scope

There are many data privacy regulations such as HIPAA, GDPR, DPDP and more that require that companies in countries or industries with those regulations, protect customer data. For example, a health technology company that is collecting PHI (personal health information) is required to secure that data according to HIPAA regulations. However, this can sometimes be at odds with what developers and engineering teams need. An engineering team needs data to build and test new features however if they use production data, then their development and even local systems can be in scope of HIPAA compliance. That means that they have to protect their development and local environments the same way they would protect their production environment.

This can place a big burden on security, compliance and engineering teams and isn't the right approach. One of the use-cases that we see for Neosync is to anonymize and generate synthetic data so that you can use it locally and reduce your compliance scope while still having access to high quality data. This not only helps engineering teams but also reduces the compliance and audit scope for security and compliance teams.
32 changes: 0 additions & 32 deletions docs/docs/overview/use-cases/anonymization.mdx

This file was deleted.

24 changes: 0 additions & 24 deletions docs/docs/overview/use-cases/replication.mdx

This file was deleted.

0 comments on commit c1d62b6

Please sign in to comment.