Skip to content
Distributed object store
Java Other
  1. Java 99.9%
  2. Other 0.1%
Branch: master
Clone or download

Latest commit

justinlin-linkedin Fix a test failure where delete request arrives after undelete (#1530)
This should fix an ambry-server integration test failure in the RouterTest, where a delete is acknowledged by the notification system, but actually one of the delete requests is handled after undelete.

This is what happened.
1. Router issues a delete operation, which sends three delete requests to three hosts
2. Two hosts responds and at the same time, third host replicates delete from first two
3. All three hosts has delete so the notification system is acknowledged.
4. Router issues a undelete operation, which sends undelets request to third host
5. After undelete, somehow the delete request from previous delete operation just arrives at the third host. This would mark blob deleted.
And that's how we get a Blob_Deleted after undelete.

The fix is pretty simple, change the delete parallelism to 2 so we don't have a dangling delete request.
fixes #1526
Latest commit 0387d91 May 27, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ambry-account/src Upgrade Helix lib to Apache Helix 1.0.0 (#1521) May 18, 2020
ambry-api/src Integrate Cosmos BulkDelete stored procedure into compaction. (#1531) May 27, 2020
ambry-cloud Integrate Cosmos BulkDelete stored procedure into compaction. (#1531) May 27, 2020
ambry-clustermap/src Notify routing table updates to replication manager and query the lat… May 27, 2020
ambry-commons/src Support monitoring inconsistency between participants on same node (#… May 20, 2020
ambry-frontend/src Return lifeVersion from get request (#1513) May 19, 2020
ambry-messageformat/src Fix a MessageSievingInputStream test failure (#1491) Apr 30, 2020
ambry-network/src Support monitoring inconsistency between participants on same node (#… May 20, 2020
ambry-protocol/src Make BlobUndeleted not an error in router and more tests (#1514) May 19, 2020
ambry-replication/src Notify routing table updates to replication manager and query the lat… May 27, 2020
ambry-rest/src Adding a new metrics to track delayed close for netty channels. (#1523) May 20, 2020
ambry-router/src Fix a test failure where delete request arrives after undelete (#1530) May 27, 2020
ambry-server/src Fix a test failure where delete request arrives after undelete (#1530) May 27, 2020
ambry-store/src Fix a test failure where delete request arrives after undelete (#1530) May 27, 2020
ambry-test-utils/src/main/java/com/github/ambry Notify routing table updates to replication manager and query the lat… May 27, 2020
ambry-tools Command line tool to reset replica tokens in Azure (#1476) May 27, 2020
ambry-utils/src Eliminate duplicate response queuing logic (#1494) May 4, 2020
config Upgrade Helix lib to Apache Helix 1.0.0 (#1521) May 18, 2020
gradle Integrate Cosmos BulkDelete stored procedure into compaction. (#1531) May 27, 2020
ide/intellij/codestyles Applying consistent styling for all java files (#491) Nov 15, 2016
log4j-test-config/src/main/resources Upgrade Helix lib to Apache Helix 1.0.0 (#1521) May 18, 2020
.gitignore Introduce container metrics (#1161) May 9, 2019
.travis.yml Enabled automated, sequenced releases from Travis CI (#1387) Feb 21, 2020
HEADER Maintain connection low watermark in NetworkClient (#1186) Jun 20, 2019
LICENSE Change copyright to 2016 Apr 26, 2016
NOTICE [stitched uploads] Support individual datachunk upload (#1022) Aug 29, 2018
README.md Add bintray latest version badge to readme Feb 28, 2020
build.gradle Temporarily exclude helix-common module to avoid conflicting domain n… May 20, 2020
gradle.properties We are seeing some build failures in dev environment since "org.gradl… Aug 30, 2019
gradlew Upgrade gradle, fix test logging and int test CPU usage (#1235) Aug 14, 2019
gradlew.bat Upgrade gradle, fix test logging and int test CPU usage (#1235) Aug 14, 2019
remove-dot-dirs.sh Remove dots from java package directory names (#1433) Apr 1, 2020
settings.gradle Clean up publishing set up and add ambry-all (#1428) Mar 16, 2020
travis-build.sh Enabled automated, sequenced releases from Travis CI (#1387) Feb 21, 2020
version.properties Enabled automated, sequenced releases from Travis CI (#1387) Feb 21, 2020

README.md

Ambry

Build Status codecov.io Download license

Ambry is a distributed object store that supports storage of trillion of small immutable objects (50K -100K) as well as billions of large objects. It was specifically designed to store and serve media objects in web companies. However, it can be used as a general purpose storage system to store DB backups, search indexes or business reports. The system has the following characterisitics:

  1. Highly available and horizontally scalable
  2. Low latency and high throughput
  3. Optimized for both small and large objects
  4. Cost effective
  5. Easy to use

Requires at least JDK 1.8.

Documentation

Detailed documentation is available at https://github.com/linkedin/ambry/wiki

Research

Paper introducing Ambry at SIGMOD 2016 -> http://dprg.cs.uiuc.edu/docs/SIGMOD2016-a/ambry.pdf

Reach out to us at ambrydev@googlegroups.com if you would like us to list a paper that is based off of research on Ambry.

Getting Started

Step 1: Download the code, build it and prepare for deployment.

To get the latest code and build it, do

$ git clone https://github.com/linkedin/ambry.git 
$ cd ambry
$ ./gradlew allJar
$ cd target
$ mkdir logs

Ambry uses files that provide information about the cluster to route requests from the frontend to servers and for replication between servers. We will use a simple clustermap that contains a single server with one partition. The partition will use /tmp as the mount point.

Step 2: Deploy a server.
$ nohup java -Dlog4j.configuration=file:../config/log4j.properties -jar ambry.jar --serverPropsFilePath ../config/server.properties --hardwareLayoutFilePath ../config/HardwareLayout.json --partitionLayoutFilePath ../config/PartitionLayout.json > logs/server.log &

Through this command, we configure the log4j properties, provide the server with configuration options and cluster definitions and redirect output to a log. Note down the process ID returned (serverProcessID) because it will be needed for shutdown.
The log will be available at logs/server.log. Alternately, you can change the log4j properties to write the log messages to a file instead of standard output.

Step 3: Deploy a frontend.
$ nohup java -Dlog4j.configuration=file:../config/log4j.properties -cp "*" com.github.ambry.frontend.AmbryFrontendMain --serverPropsFilePath ../config/frontend.properties --hardwareLayoutFilePath ../config/HardwareLayout.json --partitionLayoutFilePath ../config/PartitionLayout.json > logs/frontend.log &

Note down the process ID returned (frontendProcessID) because it will be needed for shutdown. Make sure that the frontend is ready to receive requests.

$ curl http://localhost:1174/healthCheck
GOOD

The log will be available at logs/frontend.log. Alternately, you can change the log4j properties to write the log messages to a file instead of standard output.

Step 4: Interact with Ambry !

We are now ready to store and retrieve data from Ambry. Let us start by storing a simple image. For demonstration purposes, we will use an image demo.gif that has been copied into the target folder.

POST
$ curl -i -H "x-ambry-service-id:CUrlUpload"  -H "x-ambry-owner-id:`whoami`" -H "x-ambry-content-type:image/gif" -H "x-ambry-um-description:Demonstration Image" http://localhost:1174/ --data-binary @demo.gif
HTTP/1.1 201 Created
Location: AmbryID
Content-Length: 0

The CUrl command creates a POST request that contains the binary data in demo.gif. Along with the file data, we provide headers that act as blob properties. These include the size of the blob, the service ID, the owner ID and the content type.
In addition to these properties, Ambry also has a provision for arbitrary user defined metadata. We provide x-ambry-um-description as user metadata. Ambry does not interpret this data and it is purely for user annotation. The Location header in the response is the blob ID of the blob we just uploaded.

GET - Blob Info

Now that we stored a blob, let us verify some properties of the blob we uploaded.

$ curl -i http://localhost:1174/AmbryID/BlobInfo
HTTP/1.1 200 OK
x-ambry-blob-size: {Blob size}
x-ambry-service-id: CUrlUpload
x-ambry-creation-time: {Creation time}
x-ambry-private: false
x-ambry-content-type: image/gif
x-ambry-owner-id: {username}
x-ambry-um-desc: Demonstration Image
Content-Length: 0
GET - Blob

Now that we have verified that Ambry returns properties correctly, let us obtain the actual blob.

$ curl http://localhost:1174/AmbryID > demo-downloaded.gif
$ diff demo.gif demo-downloaded.gif 
$

This confirms that the data that was sent in the POST request matches what we received in the GET. If you would like to see the image, simply point your browser to http://localhost:1174/AmbryID and you should see the image that was uploaded !

DELETE

Ambry is an immutable store and blobs cannot be updated but they can be deleted in order to make them irretrievable. Let us go ahead and delete the blob we just created.

$ curl -i -X DELETE http://localhost:1174/AmbryID
HTTP/1.1 202 Accepted
Content-Length: 0

You will no longer be able to retrieve the blob properties or data.

$ curl -i http://localhost:1174/AmbryID/BlobInfo
HTTP/1.1 410 Gone
Content-Type: text/plain; charset=UTF-8
Content-Length: 17
Connection: close

Failure: 410 Gone
Step 5: Stop the frontend and server.
$ kill -15 frontendProcessID
$ kill -15 serverProcessID

You can confirm that the services have been shut down by looking at the logs.

Additional information:

In addition to the simple APIs demonstrated above, Ambry provides support for GET of only user metadata and HEAD. In addition to the POST of binary data that was demonstrated, Ambry also supports POST of multipart/form-data via CUrl or web forms. Other features of interest include:

  • Time To Live (TTL): During POST, a TTL in seconds can be provided through the addition of a header named x-ambry-ttl. This means that Ambry will stop serving the blob after the TTL has expired. On GET, expired blobs behave the same way as deleted blobs.
  • Private: During POST, providing a header named x-ambry-private with the value true will mark the blob as private. API behavior can be configured based on whether a blob is public or private.
You can’t perform that action at this time.