Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 28 million developers.Sign up
- Collect power consumption for GPUs (#1722).
- Properly eject spot committed ask-plans (#1725).
Subtraction of committed spot ask-plans was forgotten resulting in inappropriate available resources calculation. This sometimes led to a situation where the same resource was sold multiple times. Now everything should work properly.
WARNING: we recommend everyone to update workers, otherwise, they may get blacklisted.
- Allow to disable Worker benchmarking in dev mode (#1707).
- Worker geolocation (#1692).
Now Workers are capable to detect the country ISO code where they are located and provide it in the status handler.
- Ability to collect metrics from DRI devices (#1697).
Now we're trying to find hwmon path on the device initialization. If we can do this, the monitoring interface for this card became available. The "Monitoring" method reads some data from the "/sys/dev" internals, all behavior has been ported from the ROC-smi tool.
- Extended network metrics in Worker (#1691).
The extended network metrics are now returned when requesting container/task stats. This includes traffic/packet minute-averaged rates. Required for determining the network pressure caused by a task.
- Add PayoutTargeted to Gatekeeper (#1695).
This PR add "PayoutTargeted" func to sidechain gatekeeper(only sidechain, masterchain gate not need this). PayoutTargeted create transfer to targeted destination.
- Added used bytes for the disk benchmark (#1687).
We must avoid two syscalls to gather disk space. Also will be useful for worker monitoring suite.
- Return order IDs after price prediction (#1682).
- SSH keys management (#1681).
This PR allows to specify untrusted and leaked ETH addresses that are not allowed to be used for remote SSH login. This includes our public test key, that is widely used for testing. More can be specified in Worker's config.
- Optimus should no longer remove active plans (#1679).
Currently, after the optimization process is done, Optimus replaces some of ask-plans with the new ones, dropping currently running deals, which causes painful races in the entire SONM network - all running bots are greedily engaged on a new profitable order, but only one of them succeeds. The new logic in the Worker should eliminate this race, we just support it in Optimus. Now it will cancel ask-plans only if there is no deal associated with it, delegating other plan management logic to the Worker.
- SSH on Worker host (#1644).
Now it is possible to SSH to a Worker using Node proxy, which is especially useful for debugging. Internally it works the same way as we did SSH into containers, except the user identity, which is now can be a public ETH address of a Worker.
- Avoid eth addr in connection string for insecure connections (#1719).
- Allow to mount volumes only to privileged KYC level (#1717).
- Record deal payments (DWH) (#1714).
Deal payments were not recorded to the DB for some reason. Now - they should.
- Avoid nil pointer dereferences if DevConfig is empty (#1713).
- Add mount.cifs package to debian dependencies (#1710).
- Properly restore worker state on restart (#1698).
This PR fixes the bug when the Worker forgets about his tasks after the restart.
- Proper initialize GPU resources before substraction (#1686).
- Validate node addr from config (#1684).
- Failed genetic model should not interrupt optimization (#1685).
If a genetic model in Optimus fails to find a genome that has non-zero fitness it no longer interrupts the entire optimization process, instead writing a log message. This should also fix an issue in the price predictor.
- Proper version for nvidia plugin (#1678).
Fixes a situation when nvidia drivers was updated, but docker knows about the volume with the previous version of drivers. Now the plugis will obtain driver version and create new volume if it does not exists.
- Pool processor for UleyPool (#1650).
This commit adds new pool processor for Connor's antifraud system. Can be enabled using "antifraud.pool_processor.format: uley" option.
- Check error in Node's stream interceptor (#1661).
- GetTransactionReceipt To field unmarshalling (#1660).
- Proper error code for Worker (#1647).
- Replace using "pow" with hand-written code (#1667).
- Optimize reallocations in Optimus (#1662).
- Cache GPU benchmarks in Optimus (#1659).
Another boost by ~2.
- Cache benchmarks meta in Optimus (#1658).
This suddenly gives about x100 performance booth in the "consumeGPU" method. The overall performance has been increased in ~2 times.
- Pre-filter GPUs in Optimus (#1657).
This increases the relative performance of "consumeGPU" method by ~60%.
- Blacklist purge method (#1639).
- Able to stop multiple tasks (#1504).
This commit adds a new method to the Node API which provides able to stop several tasks at once.
- Add axe optimization model (#1634).
This is the model from the winner of our Programming Challenge.
- Show hourly expenses for deals (#1515).
- Common pool processor for anti-fraud (#1552).
Idea is very similar to the "commonLogProcessor". This commit extracts common parts from pool processor into reusable structure. Now we can easily add new pools only by implementing the "updateFunc" function.
- Forward deals support in Optimus (#1632).
Now Optimus is capable of optimizing orders that are targeting to forward deals in addition to its previous spot-only support. In the config one can specify the maximum deal duration that Optimus will take in action.
- Expose DWH API on Node (#1626).
- Add more REST interceptors - logging and tracing (#1623).
- Return REST errors as JSON (#1621).
This commit slightly changes the REST server behavior by forcing to return errors as JSON for the sake of consistency. Previously all errors were returned as a plain text. Now set set the application/json header and wrap the error into a JSON object. Also the encoding/decoding logic was slightly changed - now if a Node is configured into secure mode it will return all responses as encoded byte array. This was done for symmetry, because one can have a client with encoder, and it expects the encoded response. Previously if a request failed in URI parsing stage a plain-text reply was returned. That was unexpected and completely helpless for people that try to work with secure Node in insecure manner. The error says something, but anyway you're unable to do anything with the Node unless you have a client-side encoder/decoder.
- Enable some of gRPC interceptors for REST (#1620).
- Teach Optimus to cancel optimization (#1622).
Now Optimus can stop its optimization process if a caller no longer awaits for the result.
- Validate prediction request (#1617).
- Dwh out of sync mode (#1604).
- Generalized log parser for Connor. (#1595).
This commit removes particular log parser with a single one with a subset of settings which can be applied to (almost) any logs with a number in their fields. Log reader now has 3 options related to logline analyzing:
- String pattern to detect line with logs. E.g: line that has 'Total speed' in it will be analyzed, if it doesn't - we'll skip this line.
- Field number in line. Describes which position in a string is a numeric value that shows task quality (hash rate, actually).
- The multiplier for parsed value. Used when a field contains the value in KHash/s or MHash/s. Use "1" if the multiplier is not applicable for the value.
- Show trace info in Rendezvous logs (#1601).
- Price prediction for suppliers (#1594).
Node service is now capable to obtain an approximate predicted price for devices configuration, i.e. implements price prediction for suppliers.
- Adapt extra orders in Connor (#1591).
This commit adds config option that change "restoreMarketState" behavior. If "no_cancel" parameter is enabled, Connor will adapt orders that not fits in target orders set.
- Single shot ask plans (#1541).
This PR changes ask-plan behavior, now after the order is canceled, or a deal is closed ask plan is removed. This should work better with Optimus bot, as it does continuous optimization and also simplifies the code, allowing to make separate scheduling for forwarding deals.
- Set proper status on container exit (#1643).
- Chunked deals fetching in Optimus (#1642).
While we're having constantly increasing the number of deals it may appear, that fetching the entire data set of deals can overflow the allowed gRPC response size. This commit fixes that behavior by fetching deals chunked.
- Zero values in DWH (#1638).
- Race condition while resetting puncher (#1637).
This commit fixes a thread-safety issue while handling Rendezvous errors. Previously the puncher was reset in another thread, which might cause a race condition, leading to unrecoverable error returned from an acceptor and further worker stop. Now all of this is done in the same thread when the puncher is handled.
- Compare only victim price in Optimus (#1636).
- Forward meta headers to worker interceptor (#1633).
This commit adds ability to set worker header via HTTP 1.1 request. Header will be used by WorkerManagement interceptor to create connection.
- Decrease connection timeout in sonmmon (#1630).
- Proper worker startup, removing concurrent container cleanup. (#1602).
This PR fixes behaviour when containers are being concurrently removed in event watcher, resulting in error in OnDealFinish call (e.g. inablility to push already removed container to registry). If this situation took place on startup - worker would fail to start.
- No SIGSEGV while logging requests (#1628).
- Proper full method name for REST server (#1629).
- Don't init Connor's backends twice (#1606).
- Properly generate hash (#1616).
- Sonmmon should not depend on sonmcli (#1618).
- Dedup orders while doing genetics in Optimus (#1611).
The God of Random in Optimus sometimes put the same order several times into a genome in genetic optimization, which results in invalid knapsack construction. This commit fixes that behavior by doing orders deduplication while crossing over and mutating genomes.
- Default value for new benchmarks in Optimus (#1610).
When we add a new benchmark, Worker is required to be restarted to update those benchmarks. However when doing supplier price prediction, it is possible to forget setting some of the benchmarks, which results in completely ignoring values in Optimus. All of this results in invalid prediction. This commit fixes such behavior.
- Wait for status before starting X (#1605).
- Properly remove ask plan (#1593).
- Set JSON content type for REST responses (#1599).
- Chunked fetching orders in Optimus (#1590).
We've reached gRPC frame size limit when fetching too many orders from the DWH. This commit applies paging to avoid this problem.
- Remove redundant and dangerous channel closing (#1592).
- Proper error handling in MultiSig API (#1584).
- Bump version in Makefile (#1588).
- Relay asynchronously (#1589).
Do not block on meeting, while holding a lock.
- Socket leak in Relay (#1580).
Several weeks ago a new bug was introduced in Relay server - half of the remote peer sockets were leaked forever. This was because of improper synchronization during rendezvous point, where both remote server and client are met. So when a client did its job we closed only a server socket, not the client one and vise-verse. This patch fixes such behavior - everything should work fine from now, at least with dead socket collection. Closes #1566.
- Handle error if failed to create a new account (#1583).
- Check for deal existence before processing (#1557).
We can get a race when "waitForDeal"'s timer ticks after the "waitForExternalUpdates"'s one. Deal will be traited as externally opened and its processing will be started. Then waitForDeal will make a tick, detected new deal opened for known order and add the same deal to the processing. This commit adds check that deal is not in the state storage before starting the processing routine.
- Introducing GetOrdersByIDs method in DWH (#1570).
This PR introduces a method to DWH API that allows you to get orders with required IDs.
- Replace "--out=json" with just "--json" flag (#1559).
- Update ethereum, build on go 1.11 (#1564).
Now the project can be built on go 1.11.
- Print CPU model in worker devices cmd (#1563).
- Password flag for sonmcli login command (#1551).
This commit adds able to pass "--password=topsecret" flag for "sonmcli login" command. Useful for auto-installer script.
- Keep internal state for Connor (#1545).
- Dwh stats handle (#1540).
The new GetStats() handle returns current DWH stats.
- Activate nice market API on Worker matcher (#1534).
This allows to see why a Worker's matcher failed to open a deal, instead of just "transaction failed".
- Show CPU cryptonight benchmark in CLI (#1536).
- GRPC rate limiter (#1525).
This commit allows restricting Worker and Node services usage by specifying rate limiter interceptor, which can be configured for taking in action both global and precise settings - for each method differently. This is required to avoid some dumb DOS attempts, however, not ideal. The idea is to keep RPS counter for each remote peer that is identified by ETH address, which we can extract from the request. The counter has EWMA internally, which is good for an npredictable load profile. Then this EWMA exceeds the specified threshold an "Unavailable" error is returned. Closes #1495.
- Btrfs based storage quotas (#1034).
It's based on Btrfs subvolumes and quota groups limits. Each subvolume can have its own limit (Docker sets this limit if you ask) and several subvolumes can be added to a quota group to have a shared limit. https://btrfs.wiki.kernel.org/index.php/Quota_support Playing with quota groups allows us to provide a shared limit. When a container is created we lookup its rootfs directory. WARNING: Docker does not expose this API, so some dark magic applied to fetch btrfs graph driver related information. We extract this information by reading files inside graph driver directory. DealID is used as quota group id. We allocate new group if it does not exist. A subvolume with roofs of the container is assigned to this group. When container dies we remove this subvolume from quota group. If the quota group is empty, it's deallocated.
- Reduce bid price predictor deviation (#1571).
Bid price predictor is now trained on BID orders that were successfully converted into deals. This should reduce the deviation between the predicted price and the actual one. Also Optimus is now trained on BID orders, while previously all active orders were used.
- Memory leak in Node and everywhere (#1572).
Node and other servers under the medium/high load should no longer suffer from a memory leak.
- Calculate uptime only when benchmarks is finished (#1561).
- Proper version var for cli (#1560).
- Show change request price in USD/h (#1562).
- Use global timeout for task list cmd (#1558).
- Restore 13-th benchmark for Connor (#1538).
- Restore setting version in apps (#1533).
Suddenly, go doesn't allow to set a nested variable through link arguments, only strings.
- Do not build linux-only tests everywhere (#1529).
- Remove extra benchmark from Connor's config (#1526).
Should be added in #1512.
- Always expand config path (#1519).
Previously if user specifies config path like "
/config/node.yaml", service cmd wrapper wasn't threated "" as home dir. This commit fixes this issue.
- Proper xdocker.Reference marshalling (#1524).
This commit fixes xdocker.Reference marshalling in the way it can be used via json and yaml encoders.
- Restrict Optimus CPU usage via cgroup (#1244).
This commit allows restricting Optimus CPU usage via cgroups from the config. Since Optimus can consume a significant amount of CPU this can be useful for semi-automatic resource restrictions.
- Disabled anti-fraud modules in Connor (#1370).
This commit adds null log and pool processors for Connor's anti-fraud module. It can be used for testing purposes or if the user wants to run custom (non-mining) tasks with external quality tracking.
- Detect duplicate orders in Optimus (#1359).
During learning the following situation may appear: there are a set of orders that are planned to replace existing ones, but some of them are technically equal, i.e. have the same resources, price, and duration. Current Optimus just replaces ask-plans, which results in deal cancellation, which is not good. This commit teaches Optimus to detect duplicate orders when it's time to replace ask-plans with more profitable ones and not to touch them if any.
- Store Connor's task logs (#1390).
- NPP metrics in Node (#1392).
This commit adds NPP metric collection on a client side, allowing to see how exactly connections are established, which kind of NAT punching tool was used and how much time it was consumed. Also, there is a new service in Node - monitoring, where one can fetch collected metrics.
- Counterparty address for Connor's orders (#1401).
This commit adds an ability to set counterparty address and place orders only for the required supplier. Also, migration code was added. If a user wants to change non-main benchmark or set counterparty or change netflags, Connor should be able to detect that new requirements are changed. It reached with partial order hashing when we decide to restore existing order or not.
- Worker address flag for CLI (#1431).
This PR adds an ability to specify custom worker address in CLI to work with.
- Add push_on_stop flag to container desc (#1426).
This PR introduces a push_on_stop flag to task specification that makes pushing to remote repository conditional.
- Verbose filtering in Optimus (#1443).
- Log requests in Rendezvous (#1455).
- Extended status for Worker (#1456).
This commit extends status method with following data:
- Master and admin addresses.
- Are benchmarks passed?
- Is the master address confirmed? We are forced to start gRPC server before benchmarks passed and master confirmed. Now, on worker's startup, only WorkerManagement service will be registered, and no NPP listener used. When all of the setup routines completed, we're closing gRPC server and re-create it with fully-registered services and NPP listener.
- Generate SSH key instead of requiring (#1462).
This commit simplifies SSH server configuration on a Worker by deprecating "private_key_path" option in "ssh" section. Now SSH key is generated at Worker startup and cached in boltdb.
- Connor's log processor for xmrig (#1472).
This commit adds a new log processor type for Connor's anti-fraud module. Brand new processor able to read logs from the xmrig miner, which is used to mine Monero (cryptonight) on CPUs.
- Process blockchain in Worker in parallel (#1377).
This PR enables parallel blockchain processing in the salesman, which results in a boost in massive ask-plan creation or purging.
- Show tags in orders list (#1503).
- Proper deal closing when a price is changed in Connor (#1352).
Such problem occurs in the following conditions: Connor detects price deviation and starts replacing orders, also it starts to close non-profitable deals. When the deal is closed, a worker on another side can pick the order that we schedule for cancellation. This order is turned into a deal that immediately will be closed because of the low price. So this shitfall will perform until all of the orders isn't replaced. This commit fixes that behavior: now Connor checks that the cancel chan is empty before really closing an active deal.
- Remove repo & tag from an uploaded image (#1353).
- Allow gRPC logger to truncate specific methods output (#1360).
- Proper (un)marshaling for empty SSH keys (#1375).
This commit fixes couple bugs introduced in #1281.
- Failed to start a task without providing ssh pubkey.
- Marshalling an empty pubkey crashes worker.
- Show allocated resources in TaskStatus (#1368).
This commit fills the
AllocatedResourcesfield from the deal's ask-plan. Actual resources info gives an ability to show what part of deal's resources are acquired by task, at least what GPUs is used.
- Properly parse docker load output (#1373).
- Kill tasks if deal was closed (#1387).
- Proper init for disabled processor in Connor (#1388).
- Node hanging on failed startup (#1395).
This commit fixes race when Node starts with some failed condition, for example, when SSH port is occupied, and then hangs forever.
- Properly close read/write part of sockets in Relay (#1398).
- Proper deal id comparison (#1408).
- Socket leak in Node (#1424).
This commit fixes socket leakage in Node by draining pending channels in case of timeout errors.
- Unregister closed deal despite the result of order deregistration (#1433).
Due to some strange circumstances, that could be possible if we received stale data from blockchain after deal registration, or failed to save data in boltdb, an order could be removed before a deal, resulting in a neverending cycle of deal removal. This PR fixes this behavior.
- Restore deals only if Connor is a consumer (#1427).
- Proper task tag printer (#1451).
- Proper error if the profile doesn't exist (#1450).
- Proper identity level checking worker SSH (#1444).
This commit fixes invalid identity level verification while performing SSH access to a container using NPP API. Previously an identity level of ask-plan was checked, while actually it must be the consumer's level checked.
- Properly detect IPv4 addresses (#1460).
For some unknown reasons, Go developers keep IPv4 addresses in a slice of size 16(!). It's so logical, yes.
- The proper formula for XMR price calculation (#1459).
This commit fixes calculation algo for XMR. Also, a signature of the PriceProvider's
calculateFunchas been changed because we want to use all available token params, not only reward and net difficulty.
- Proper env params applying in Connor (#1474).
- Restrict image push with KYC level (#1458).
This PR introduces ACL which checks KYC level and allow only users of a specific level to push custom images on the worker. Also, it's made configurable and it is reused in whitelist configuration.
- Drop task if the deal was closed during startup (#1476).
This PR fixes bug, when a deal was closed during task start and the task was not released properly.
- Underflow in price prediction service (#1498).
- Show all balances when with output=JSON (#1500).
- Keep container info after task stop (#1441).
- Activate onCertificateUpdated handler in DWH (#1502).
- Configurable EWMA for Connor's processor (#1478).
- Properly initialize tinc network (#1505).
This PR fixes ability to add tinc network alongside shaped interface. As Docker does not accept multiple networks during container creation we do connect them afterward.
- Log the supplier's address in Connor (#1510).
This commit extends Connor's logger (bounded to a deal) with the supplier eth address. Also, use proper logger instance in the
- Proper logging on price deviation in Connor (#1511).
Early we logging price-per-second-per-hash from the PriceProvider, this commit replaces this value with full order price per second which is more easy for future analysis.
- Proper log processor for Connor (#1514).
This commit fixes several issues with log processor:
- Wrap calculated hashrate value with atomic to make it thread-safe.
- Perform initializing tick for EWMA.
- Add timestamps for stored logs.
- Read logs with tail. It prevents us from reloading full task log when Connor restarts or when connection was lost and Connor should start read logs again.
- Proper systemd config (#1516).
- DWH L2 (#1295).
DWH now stores events history, allowing to perform data analysis over the entire history of our sidechain.
- Close unprofitable deals in Connor (#1310).
This activates price checking for Connor's deals. If mining profit becomes less that we paying for a deal - Connor will close it without adding the supplier to the blacklist.
- XMR price provider for Connor (#1321).
This commit replaces incompleted ZEC's price provider with the XMR one. Also, this commit replaces ZEC token with XMR_CPU in the config.
- Activate request logging in Relay (#1329).
Now all gRPC requests will be logged as they come.
- Info handle in Relay (#1328).
This addition activates info handle in Relay server, allowing to view current server's state, similar how we do in Rendezvous.
- Blockchain errors for humans while opening a deal (#1291).
This commit activates special interceptor on Node that performs userland verification before opening a deal. This allows returning human-readable errors instead of just "Transaction failed" in case of where it is impossible for some reasons to open a deal, which helps to figure out what's exactly wrong in two orders and why they cannot match.
- Relative price threshold in Optimus (#1334).
This commit allows specifying either absolute or relative price threshold in Optimus. It works the following way: if Optimus finds a new more profitable orders set it looks at the relative price difference in percents between the new orders set and the existing one.
- Price predictor service (#1254).
This commit allows to optionally activate order price predictor service on Node. This service is capable of continuous training on current marketplace's state to be able to predict an order price from its benchmarks and netflags. A simple multidimensional regression model with non-negative coefficients is used. It is exposed as a separate gRPC/REST service and can be accessed the same way others methods are.
- Allow passing output to the logger (#1348).
This commit extends logger config with the desired output. Can be "stdout" as it previously works, or it can be a path to log file. Logs are written to the stdout will be colorized. However, redirected descriptors are automatically detected, switching off colorizing if a new descriptor is not a TTY.
- Remove extra orders in Connor (#1355).
This commit brings code that looks at the desired and existing order sets which present on the Market and decide what orders should be removed. Extra orders can appear when the configured hash rate range was changed, for example by decreasing.
- Parallel order and deal purging (#1354).
This PR extends order and deal API, providing an ability to remove several orders and deals at a time (including full purge). This is done on Node in a parallel manner reducing the time needed for operation.
- SSH into containers through Node (#1216).
This allows performing SSH'ing into containers by specifying only deal and task identifiers independent of wherever those containers are being run. This is done by proxying the traffic through local Node. The idea is to hijack to incoming TCP connection with further resolving the real endpoint of a Worker where a container is being run by deal ID using NPP with further traffic forwarding directly into it. The Worker may not be having a public IP address. This is an experimental feature and should be optionally activated both on Worker and on Node. Note, that on Node you should also configure an SSH agent to be able to obtain credentials with which an access into a container will be performed.
- Show only public IPs in deal status (#1335).
Previously all networks including private were shown, but such information is completely unusable.
- Drop excessive logging in Connor (#1333).
- Start services when the network becomes online (#1336).
This change touches the default systemd configuration, forcing our services to start strictly after the network is fully configured and upon a system.
- Mark node's Balance method as deprecated (#1357).
Use BalanceOf instead, which is more flexible and allows to specify the target address.
- Counterparty matching in DWH (#1315).
After this fix, DWH should properly match orders, doing the proper identity level check.
- Zero creator identity level in DWH (#1317).
Now a meaningful value is used instead of just zero.
- Gradually decrease blacklisting in Connor (#1318).
This commit fixes worker unblacklisting in the following way: each tracked success decreasing the next possible blacklist step by the value of time in which task was correctly worked. Each failure doubles a period of time in which a worker will be placed on the blacklist. So if a worker is unblacklisted after the previous failure (1hr) and correctly work for 30 minutes, then, next blacklist time is 2hr minus 30 minutes.
- Treat none benchmarks as min in Optimus (#1322).
This fixes a bug when Optimus created ask-plan for orders that specified CPU cores benchmark with the value greater than on the system. That results in order that can't be matched with the target one.
- Support old samba servers (#1327).
This fixes a bug when specifying "vers" option for CIFS plugin resulted in an error (seen in dmesg) that CIFS doesn't support "vers" option. This means that either too old CIFS compatibility layer is installed or there is an old smb server. Now we just ignore unspecified "vers" option.
- Connor should ban worker if cannot start a task (#1326).
- Proper CertificateUpdated event processing (DWH) (#1331).
- Attach antifraud to restarted task (#1330).
Previously, if the task was restarted antifraud module does not restart log processing for the new task instance. This commit fixes that behavior.
- Use flags to control blacklisting in Connor (#1332).
This commit adds extendable flags to control the antifraud behavior in different cases. Also, it fixes a problem when we blacklisting a worker which picks-up a deal which we decide to re-create due to price changes.
- Proper order price setting in Connor (#1337).
This commit moves a price calculation code into a "sendOrderToMarker" method, it allows us to calculate the actual price on each order re-creation. Also, it fixes a bug when Connor replaces order on price deviation by an order with absolute same parameters.
- Show DWH endpoint in worker status (#1343).
Now we show actual DWH endpoint instead of a blank line.
- Remove IFB dev on ask-plan remove (#1347).
We use IFB (intermediate functional block) devices for more smooth traffic shaping. However, they were not properly collected after the associated ask-plan is removed, which led to garbaging on a system with dead link devices. Now everything should be properly cleared.
- No peer found error in Relay (#1350).
Such error appeared when a server was under pressure. It worked the following way: a server announced itself and waited for a connection. When a client came it consumes that connection, causing "noPeerFound" errors for other clients that tried to resolve a server at the same time. This commit fixes that behavior by putting clients into a short-waiting queue. In the future, we can improve this by checking whether a server was ever seen by Relay.
- Proper task start in Connor (#1349).
This commit increases task starting timeout and makes it configurable. Also, on "StartTask" retrying we'll check that previous attempt doesn't really start a task.
- Increase connection timeout in Connor (#1358).
- Allow NetworkIn only with public IPv4 addresses (#1342).
- Network shaping (#1280).
This enables network traffic shaping on a worker, making it possible to limit network bandwidth for each deal separately. Internally this is achieved using linux kernel traffic control mechanism and making friendship with Docker. At the first attempt, we achieved this by using policing, which drops excess packets, throttling TCP window sizes and reducing the overall output rate of affected traffic streams. Overly aggressive burst sizes (which is tricky to set properly) led to excess packet drops and throttle the overall output rate, particularly with TCP-based flows.
All the above is relevant to the TBF (token bucket filtering) classless discipline, which is the easiest way to shape network traffic. An alternative approach is using the HTB (hierarchical token bucket) queueing discipline, which is classful and allows to build hierarchical rules for traffic shaping and policing. For egress traffic, intermediate functional block (IFB) devices are used which has separate packet queueing.
All this allows both to build hierarchical rules for each packet type, network device, etc; and moreover to restrict traffic for each container and/or for the entire worker.
It is clear for now that both direction shaping works without packed dropping spikes. Epic win. After an ask plan is created the following actions are performed:
- Create bridge device with its own iptables rules, where all containers for a specific deal will live. This bridge allows tasks to communicate with each other, but not with tasks spawned within another deal. For that purposes, you have to use overlay network drivers.
- Apply TC rules to that bridge, that will limit ingress traffic. Yep, ingress, because container's egress traffic seems as an ingress traffic for the Docker device.
- Create IFB virtual device.
- Mirror egress traffic to that IFB by applying TC filters and create another rule for that traffic.
- Print order identity in CLI (#1286).
- Allow obtaining balance by specified wallet (#1294).
This commit extends Node API to be able to call blockchain's "balanceOf" with given eth address as a parameter. On the CLI-side there is still a single method but with an optional extra parameter.
- Network shaping for BTFS (#1299).
This commit activates network shaping for BTFS volume plugin. Previously BTFS was run under a separate bridge network, which was not connected with the network under which the deal tasks are being executed. However, now we are capable of performing task's network shaping under the constraints specified in a supplier's order. This commit changes the previous behavior with starting BTFS container under the same network with the task it manages, so network limitations should properly work.
- Prometheus metrics for Connor (#1296).
This adds Prometheus metrics which gives us some info about orders and deals lifespans.
- Simulation mode in Optimus (#1308).
This commit allows Optimus to be executed in simulation mode, fetching only specified orders from the marketplace. Also fixed a bug when Optimus removed ask-plans in dry-run mode.
- Update price changes in Optimus (#1307).
This teaches Optimus to update auto-accepted price changes for deals. Previously it only considers a price specified in ask-plan, not the real one, which may change during deal's lifetime.
- Default DWH config update (#1288).
- Tune dry-run in Optimus (#1301).
Now Optimus while being run under dry-run option will show removing candidates ask-plans.
- Parse BID and ASK strictly (#1282).
This fixes a bug when a user can accidentally create BID or ASK with fields that are unknown to the Node/Worker. Such behavior is a source of confusion and frustration, so we just forbid unknown fields when doing "sonmcli order create" and "sonmcli worker ask-plan create".
- Properly calculate free devices in Optimus (#1285).
This fixes a bug where during the calculation of free devices a mutation of meant-to-be-immutable field occurs, which lead to improper free devices state. As a side effect, one could see that Optimus performed invalid optimization in the case where the most optimal ask-plans already exist.
- Sudden EOF while connecting through NPP (#1253).
The problem is - both client and server successfully connect to each other, so there are at least 2 different connections. The server must serve both of them with further dropping the hanging one. The client must return the first one and close another. If we are a server - we just put the successful connection in a queue and return on next "Accept" called. If we are a client - do as we did before.
- Processor shutdown in DWH (#1270).
Hitting Ctrl-C could lead to a long and painful shutdown if DB and other resources were closed before the last iteration of the processor loop. This problem is fixed by making sure that the processing loop is fully locked, not just its parts.
- Proper blacklisting in Connor (#1292).
This fixes the blacklist timer behavior on failures and successes. Also, resetting the timer if a user was un-blacklisted on the market.
- Properly discover loopback links (#1297).
This fixes a bug where our components, like Node and Worker resolved loopback devices using self-localhost-resolution. This sometimes led to a problem on systems with disabled IPv6 but misconfigured /etc/hosts.
- Proper BTFS initialization (#1298).
This fixes error ignoring while initializing BTFS volume plugin.
- Proper JSON errors for task list cmd (#1300).
- Check BID identity level in Optimus (#1287).
This is required due to #1293.
- Proper order cancel in Connor (#1306).
- Proper own ask filtering in Optimus (#1312).
Optimus should no longer create plans for its own asks.
- Feat: BTFS volume plugin for Worker (#1274).
This commit activates an ability to spawn SONM tasks with torrent filesystem attached as a volume. Internally this is achieved by using BTFS application that constructs a torrent client and mounts itself on a filesystem via fuse. It's the read-only solution designed to distribute large amount of data across multiple task without single point of failure - all tasks help each other to share the data, which reduces network pressure and improves application scalability.
- Enable pprof for Node (#1272).
- Update gRPC library (#1258).
- Proper config for Connor (#1265).
- Proper Connor's binary path in systemd-unit (#1271).
- Load keystore even if "--insecure" flag is set (#1276).
This commit fixes a problem when "--insecure" flag is used with the CLI and some sub-command wants to load a key or obtain a default eth address from the key store. Therefore, now we'll load keystore prematurely for any command. Now if insecure mode is activated, only TLS credentials creation will be skipped for gRPC's client connection.
- Show Order's price per hour (#1277).
Connor, the auto-buy mining bot (#1227).
This introduces Connor, the bot which buying resources for mining. The current implementation can:
- Places BID orders according to config.
- Restores orders, deals, and tasks between restarts.
- Loads actual price of the target token, create orders according to price and marginality coefficient.
- Tracks token price changes using CoinMarketCap, re-create orders if the price changed sufficiently.
- Looks for actual hashrate by logs and mining pool statistics.
- Closes deals which give us less hash-rate than offered.
Note that current implementation is strongly coupled with the Ethereum mining on the DwarfPool. Next, we'll make it more modular and configurable for other popular tokens and pools.
Enable pprof in Optimus (#1255).
- Fast benchmark mapping in Optimus (#1256).
Added smart benchmark mapping algorithm, which indexes benchmarks in a flat map backed with array. This increases performance of genetic algorithm in 2-3 times.
- Use proper combinations limits in Optimus (#1261).
It doesn't make sense to building combinations of GPUs with k more than n, which is the length of currently free GPUs.
- Avoid extra copying in Optimus (#1262).