Skip to content

Commit

Permalink
mdoify set-design-en
Browse files Browse the repository at this point in the history
  • Loading branch information
gengxinMT committed Mar 11, 2019
1 parent 33f2c16 commit 42b2731
Showing 1 changed file with 46 additions and 49 deletions.
95 changes: 46 additions & 49 deletions proposals/set-design-en.md
Expand Up @@ -47,106 +47,103 @@ type SetMeta struct *{*
## Command processing
### General command processing
#### SAdd key member [member ...]
* Add an element to the set corresponding to the key.

* Add the specified members to the set stored at key. Specified members that are already a member of this set are ignored. If key does not exist, a new set is created before adding the specified members.

**Implementation steps**

* Deweight the member passed in
* Call BatchGetValues to batch obtain the corresponding value of datakey, and judge whether the member already exists according to the type of value
* Filter out existing members and count the number of new members
* Check if the member passed in is already in the set and remove the duplication if there is any.
* Call BatchGetValues to get values corresponding to the datakey in a batch manner and judge the existance of the member by the value type.
* Eliminate the existed members and count the number of newly-added members.
* Update Meta information and return the number of new members


#### SMembers key

* Returns key for all members of the collection
* Returns all the members of the set value stored at key.
**Implementation steps**

* Use an iterator to find the location of the spliced prefix in the store
* returns all elements with the same prefix
* Use an iterator to find the location of the spliced prefix in the storage
* Return all the elements with the same prefix

#### SCard key
* returns the key corresponding to the number of elements in the collection.。
* Returns the set cardinality (number of elements) of the set stored at key.

**Implementation steps**

* return Len in the meta information
* Return the Len property of the meta data

#### SIsmember key member
* Determines whether the element member is a member of the set key
* Returns if member is a member of the set stored at key.

**Implementation steps**

* Look for the existence of a spliced datakey in the storage
* returns 1 if it exists, or 0 if it does not
* Try looking for the spliced datakey
* If exists, return 1 or return 0

#### SPop key

* Returns and deletes an element of the set corresponding to the key at random. Since the data is stored in TiKV in an orderly manner, you simply delete and return the key corresponding to the first element in the set
* Removes and returns one or more random elements from the set value store at key.

**Implementation steps**

* Delete the key corresponding to the first element in the set
* Update meta information to return deleted member
* Delete the first element of the set given by the key
* Update meta information and return the deleted member


#### SRem key member [member ...]
* Removes one or more element members specified in the collection key. Member is ignored if it does not exist.

* Remove the specified members from the set stored at key.

**Implementation steps**

* Look for the existence of a spliced datakey in the storage
* if there is, call delete to delete
* Try looking for the spliced datakey
* If exists, call delete to delete it/them.
* update meta information

### SMove key key1 member
* Moves the member of the source set corresponding to key into the target set corresponding to key1. If the source set corresponding to key does not exist or contains the specified element, no operation is performed and 0 is returned. Otherwise, the element will
* Delete from the source set corresponding to key and add to the target set corresponding to key1.

**Implementation steps**

* First, get whether member belongs to the source set corresponding to key, if there is no direct return inside the source set
* next, determine whether member exists in the set corresponding to target key1. If not, add member to the set corresponding to key1 to update meta information
* Last,delete member in the set corresponding to the source key and update meta information
* Move member from the set at source to the set at destination.

**Implementation steps**

### Collective command processing
* First, check if the member is an element of the source set given by the key . If not, return.
* Next, check if the member is an element of the target set given by key1. If not, add the member to the key1 target set and update the meta information.
* Lastly, delete the member of the key source set and update the meta information.

### Set-based command processing

For the collection class command, the most intuitive implementation scheme is to read all members into the memory for calculation when calculating the intersection difference set. Although the performance of partial calculation can be optimized by deduplication and sorting through the map, the memory is still tight when the number of elements in the set is particularly large. Because each key corresponds to the member stored in set, which is in order in memory, we can refer to the idea of merge to complete the set operation, and the specific implementation idea is as follows
For the set-class command, the most intuitive scheme to calculating intersections, unions and difference sets is to read all members into the memory first. Although the performance can be partially improved by using map to do the deduplicating and sorting process, it still meets with the problem of insufficient memory in case of large sets with so many elements. Another idea is based on the merging thought for each member in the set is saved in order in the memory. The detailed implementation is as follows.

#### SUion -- find the set union corresponding to the given key
#### SUion -- Returns the members of the set resulting from the union of all the given sets.
##### Implementation steps

1. Set a pointer (iter) for the set corresponding to each key, pointing to the first member(key) of the current set. Since member is stored in order, the first member must be the smallest in the current set
2. Compare each member size, divided into the following two situations
1. Set a pointer (iter) for each set with a specific key. The pointer is initialized to the first member of the key set. Since members are stored in order, the first member must be the smallest one in the current set.
2. Compare the value of each member. According to the comparison result, following cases are managed.

* if the same size proves to be the same element, record this member as part of the union result
* if an unequal size exists, move the pointer to the smallest element backward and record member as part of the union result
* If the values of the members are the same, it proves they are the same element, then record this member as a part of the union.
* If the values are different, move the pointer of the smallest element one step backwards and record this member as a part of union.

3. Repeat step 2 until all members of the collection have completed their seek. If only one set is left after the comparison, the remaining elements of the set will be taken as part of the union result and the union result will be returned
3. Repeat step 2 until all the members of each set finish with seeking. A special case is that there is only one set left in the end. Then just merge the remaining elements of the set into the union and finally, return the union.

#### SInter -- find the set intersection corresponding to the given key
#### SInter -- Returns the members of the set resulting from the intersection of all the given sets.
##### Implementation steps

1. Meta information is read while set object is created, and nonexistent collection is treated as empty set. Once there is an empty set, there is no need to continue the calculation, and the final intersection is the empty set.
2. If no empty set exists, a pointer (iter) is set to the corresponding set of each key, pointing to the first member(key) of the current set. Since member is stored in an orderly way, the first member must be the smallest in the current set
3. Compare the size of each member and divide it into the following situations
* if the same size proves to be the same element, record this member as part of the result of the intersection, and move the pointer of all keys corresponding to the set back one position
* in case of unequal size, it is proved that the smaller element will never appear in other members, and the pointer (iter) except the largest member will be moved back one place
4. Repeat the second step until a pointer exceeds the end of the sequence. At this time, it is proved that the set with the lowest number of members has completed all seek.
1. Read meta information while constructing a set object. If the set is not existed, regard it as an empty set. Once an empty set is found, stop calculating as the empty set is obviously the intersection.
2. If no empty set exists, a pointer (iter) is set for each set with a specific key. The pointer is initialized to the first member of the key set. Since members are stored in order, the first member must be the smallest one in the current set.
3. Compare the value of each member. According to the comparison result, following cases are managed.
* If the values of the members are the same, it proves they are the same element, then record this member as a part of the intersection. Move all pointers(one for each set) one step backwards.
* If the values are different, it proves smaller element can't be in the members of other sets. So move all the pointers one step backwards except for the set with the largest member.
4. Repeat step 2 until some pointer exceeds the sequence. It proves that the set with the least members has already finished with seeking. It's time to exit the process and return the intersection result.


#### SDiff key [key]... -- find the difference between the set corresponding to the given key and the set corresponding to the first key
##### Implementation steps
#### SDiff key [key]... -- Returns the members of the set resulting from the difference between the first set and all the successive sets.

##### Implementation steps

1. Set a pointer (iter) to the set corresponding to each key, pointing to the first member(key) of the current set. Since member is stored in an orderly way, the first member must be the smallest > in the current set
2. Take the specified first key as the benchmark key, compare the member size with other keys, and divide into the following situations
* if the member is of the same size after comparison with the following member, it proves to be the same element, indicating that the member is not in the range of difference set. Move the pointer of the benchmark key and the key with the same member back one > position
* if not, the pointer to the smallest member corresponding to key is moved backward; if the key is the reference key, the record pointer points to member as part of the difference set
3. Repeat step 2 until all members of set complete seek and return the result of difference set

1. Set a pointer (iter) for each set with a specific key. The pointer is initialized to the first member of the key set. Since members are stored in order, the first member must be the smallest one in the current set.
2. First key denotes the benchmark set. So compare other sets(specified by [key ...]) with the first key set. The following cases are handled.
* If the members are with the same value, then the member can't be a part of the difference set. Move both the pointers of benchmark key and other keys one step backwards.
* If the two values of members are different, then move the pointer of the set with the smallest member one step backwards and record the current pointed member of the benchmark key set as a part of the difference set.

3. Repeat step 2 until all the members of each set finish with seeking and return the difference set.

0 comments on commit 42b2731

Please sign in to comment.